From 3fe8732f32e3675c62a46fcb70fb95ef936376a5 Mon Sep 17 00:00:00 2001 From: Peter Drahoš Date: Fri, 1 Oct 2010 03:22:32 +0200 Subject: Import to git --- index.html | 951 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 951 insertions(+) create mode 100644 index.html (limited to 'index.html') diff --git a/index.html b/index.html new file mode 100644 index 0000000..956e691 --- /dev/null +++ b/index.html @@ -0,0 +1,951 @@ + + + + + + + + + Lua Lanes - multithreading in Lua + + + +

+ + + + + + + + + + + +

+ +

Lua Lanes - multithreading in Lua

+ +

+ Description · + Supported systems · + Building and Installing +

+ Creation · + Status · + Results and errors +

+ Cancelling · + Finalizers · + Lindas · + Timers · + Locks etc. +

+ Other issues · + Change log + + +

Copyright © 2007-08 Asko Kauppi. All rights reserved. +
Lua Lanes is published under the same MIT license as Lua 5.1. +

This document was revised on 23-Jan-09, and applies to version 2.0.3. +

+ + +

+ + + +

Description

+ +

Lua Lanes is a Lua extension library providing + the possibility to run multiple Lua states in parallel. It is intended to + be used for optimizing performance on multicore CPU's and to study ways to make Lua programs naturally parallel to begin with. +

+ Lanes is included into your software by the regular + require "lanes" method. No C side programming is needed; all APIs are Lua side, and most existing extension modules should + work seamlessly together with the multiple lanes. +

+ See comparison of Lua Lanes with other Lua multithreading solutions. +

Features:

+ +

Lanes have separated data, by default. Shared data is possible with Linda objects. +
Communications is separate of threads, using Linda objects. +
Data passing uses fast inter-state copies (no serialization required)
"Deep userdata" concept, for sharing userdata over multiple lanes +
Millisecond level timers, integrated with the Linda system. +
Threads can be given priorities -2..+2 (default is 0). +
Lanes are cancellable, with proper cleanup. +
No application level locking - ever! +

+ + +

Limitations:

+ +

coroutines are not passed between states +
sharing full userdata between states needs special C side + preparations (-> deep userdata) +
network level parallelism not included +

+ + + +

Supported systems

+ +

Lua Lanes supports the following operating systems: + +

Mac OS X PowerPC / Intel (10.4 and later)
Linux x86
Windows 2000/XP and later (MinGW or Visual C++ 2005/2008)

+ +

The underlying threading code can be compiled either towards Win32 API + or Pthreads. Unfortunately, thread prioritation under Pthreads is a JOKE, + requiring OS specific tweaks and guessing undocumented behaviour. Other + features should be portable to any modern platform. +

+ + + +

Building and Installing

+ +

Lua Lanes is built simply by make on the supported platforms +(make-vc for Visual C++). See README for system specific +details and limitations. +

+ +

To install Lanes, all you need are the lanes.lua and lua51-lanes.so|dll +files to be reachable by Lua (see LUA_PATH, LUA_CPATH). + +Or use Lua Rocks package management. +

+ +

+  > luarocks search lanes
+    ... output listing Lua Lanes is there ...
+
+  > luarocks install lanes
+    ... output ...
+

+ + + +

Creation

+ +

The following sample shows preparing a function for parallel calling, and +calling it with varying arguments. Each of the two results is calculated in +a separate OS thread, parallel to the calling one. Reading the results +joins the threads, waiting for any results not already there. +

+ +

+  require "lanes"
+
+  f= lanes.gen( function(n) return 2*n end )
+  a= f(1)
+  b= f(2)
+
+  print( a[1], b[1] )     -- 2    4
+

+ +

+
+ func= lanes.gen( [libs_str | opt_tbl [, ...],] lane_func ) + + lane_h= func( ... ) +
+

+ The function returned by lanes.gen() is a "generator" for + launching any number of lanes. They will share code, options, initial globals, + but the particular arguments may vary. Only calling the generator function + actually launches a lane, and provides a handle for controlling it. + +

+Lanes automatically copies upvalues over to the new lanes, so you +need not wrap all the required elements into one 'wrapper' function. If +lane_func uses some local values, or local functions, they will be there +also in the new lanes. +

+ libs_str defines the standard libraries made available to the + new Lua state: + + + + + + + + + + + + +
+ +
(nothing) no standard libraries (default)
"base" or "" root level names, print, assert, unpack etc.
"coroutine" coroutine.* namespace (part of base in Lua 5.1)
"debug" debug.* namespace
"io" io.* namespace
"math" math.* namespace
"os" os.* namespace
"package" package.* namespace and require
"string" string.* namespace
"table" table.* namespace
"*" all standard libraries
+ +

+ Initializing the standard libs takes a bit of time at each lane invocation. + This is the main reason why "no libraries" is the default. +

+ + opt_tbl is a collection of named options to control the way + lanes are run: +

+ + + + + + + + + +
+ .cancelstep
N / true + By default, lanes are only cancellable when they enter a pending + :receive() or :send() call. + With this option, one can set cancellation check to occur every N + Lua statements. The value true uses a default value (100). +
+ .globals
globals_tbl + Sets the globals table for the launched threads. This can be used for giving + them constants. +
+ The global values of different lanes are in no manner connected; + modifying one will only affect the particular lane. +
+ .priority
-2..+2 The priority of lanes generated. -2 is lowest, +2 is highest. +
+ Implementation and dependability of priorities varies + by platform. Especially Linux kernel 2.6 is not supporting priorities in user mode. +
+ +

+ +

Free running lanes

+ +

+The lane handles are allowed to be 'let loose'; in other words you may execute +a lane simply by: + +

+    lanes.gen( function() ... end ) ()
+

+ +Normally, this kind of lanes will be in an eternal loop handling messages. +Since the lane handle is gone, +there is no way to control such a lane from the outside, nor read its potential +return values. Then again, such a lane does not even normally return. +

+ + + +

Status

+ +

+ str= lane_h.status +

+ +

The current execution state of a lane can be read via its status +member, providing one of these values: ⁽² + + + + + + + + +
"pending" not started, yet
"running" running
"waiting" waiting at a Linda :receive() or :send()
"done" finished executing (results are ready)
"error" met an error (reading results will propagate it)
"cancelled" received cancellation and finished itself
+

+ This is similar to coroutine.status, which has: "running" / + "suspended" / "normal" / "dead". Not using the + exact same names is intentional. +

+ + + +

Results and errors

+ +

A lane can be waited upon by simply reading its results. This can be done +in two ways. +

+ +
+ [val]= lane_h[1] +
+

+Makes sure lane has finished, and gives its first (maybe only) return value. +Other return values will be available in other lane_h indices. +

+If the lane ended in an error, it is propagated to master state at this place. +

+ +

+ [...]|[nil,err,stack_tbl]= lane_h:join( [timeout_secs] ) +

+Waits until the lane finishes, or timeout seconds have passed. +Returns nil on timeout, nil,err,stack_tbl if the lane hit an error, +or the return values of the lane. Unlike in reading the results in table +fashion, errors are not propagated. +

+stack_tbl is an array of "<filename>:<line>" strings, +describing where the error was thrown. Use table.concat() to format +it to your liking (or just ignore it). +

+If you use :join, make sure your lane main function returns +a non-nil value so you can tell timeout and error cases apart from succesful +return (using the .status property may be risky, since it might change +between a timed out join and the moment you read it). +

+ +
+
+ require "lanes" + + f= lanes.gen( function() error "!!!" end ) + a= f(1) + + --print( a[1] ) -- propagates error + + v,err= a:join() -- no propagation + if v==nil then + error( "'a' faced error"..tostring(err) ) -- manual propagation + end +
+
+ + + +

Cancelling

+ +

+ bool= lane_h:cancel( [timeout_secs=0.0,] [force_kill_bool=false] ) +

+ +

Sends a cancellation request to the lane. If timeout_secs is non-zero, waits +for the request to be processed, or a timeout to occur. +Returns true if the lane was already done (in "done", "error" or "cancelled" status) +or if the cancellation was fruitful within timeout period. +

+If the lane is still running and force_kill is true, the +OS thread running the lane is forcefully killed. This means no GC, and should +generally be the last resort. +

Cancellation is tested before going to sleep in receive() or send() calls +and after executing cancelstep Lua statements. A currently pending receive +or send call is currently not awakened, and may be a reason for a non-detected cancel. +

+ + + +

Finalizers

+ +

+ set_finalizer( finalizer_func ) +

+ void= finalizer_func( [error] ) +

+ +

The error call is used for throwing exceptions in Lua. What Lua +does not offer, however, is scoped finalizers +that would get called when a certain block of instructions gets exited, whether +through peaceful return or abrupt error. +

Since 2.0.3, Lanes prepares a function set_finalizer for doing this. +Any functions given to it will be called in the lane Lua state, just prior to +closing it. They are not called in any particular order. +

An error in a finalizer itself overrides the state of the regular chunk +(in practise, it would be highly preferable not to have errors in finalizers). +If one finalizer errors, the others may not get called. +

+ + + +

Lindas

+ +

Communications between lanes is completely detached from the lane handles +themselves. By itself, a lane can only provide return values once it's finished, +or throw an error. Needs to communicate during runtime are handled by Linda objects, which are +deep userdata instances. They can be provided to a lane +as startup parameters, upvalues or in some other Linda's message. +

+Access to a Linda object means a lane can read or write to any of its data +slots. Multiple lanes can be accessing the same Linda in parallel. No application +level locking is required; each Linda operation is atomic. +

+ +
+
+ require "lanes" + + local linda= lanes.linda() + + local function loop( max ) + for i=1,max do + print( "sending: "..i ) + linda:send( "x", i ) -- linda as upvalue + end + end + + a= lanes.gen("",loop)( 10000 ) + + while true do + local val= linda:receive( 3.0, "x" ) -- timeout in seconds + if val==nil then + print( "timed out" ) + break + end + print( "received: "..val ) + end +
+
+ +

Characteristics of the Lanes implementation of Lindas are: + +

keys can be of number, string or boolean type +
values can be any type supported by inter-state copying (same limits + as for function parameters and upvalues) +
consuming method is :receive (not in) +
non-consuming method is :get (not rd) +
two producer-side methods: :send and :set (not out) +
send allows for sending multiple values -atomically- to a + given key +
receive can wait for multiple keys at once +
individual keys' queue length can be limited, balancing speed differences + in a producer/consumer scenario (making :send wait) +

+ +

+
+ h= lanes.linda() +

+ bool= h:send( [timeout_secs,] key, ... ) +
+ [val, key]= h:receive( [timeout_secs,] key [, ...] ) +

+ = h:limit( key, n_uint ) +
+ +

The send and receive methods use Linda keys as FIFO stacks +(first in, first out). Timeouts are given in seconds (millisecond accuracy). +If using numbers as the first Linda key, one must explicitly give nil +as the timeout parameter to avoid ambiguities. +

+By default, stack sizes are unlimited but limits can be +enforced using the limit method. This can be useful to balance execution +speeds in a producer/consumer scenario. +

+Note that any number of lanes can be reading or writing a Linda. There can be +many producers, and many consumers. It's up to you. +

send returns true if the sending succeeded, and false +if the queue limit was met, and the queue did not empty enough during the given +timeout. +

+Equally, receive returns a value and the key that provided the value, +or nothing for timeout. Note that nils can be sent and received; +the key value will tell it apart from a timeout. +

+Multiple values can be sent to a given key at once, atomically (the send will +fail unless all the values fit within the queue limit). This can be useful for +multiple producer scenarios, if the protocols used are giving data in streams +of multiple units. Atomicity avoids the producers from garbling each others +messages, which could happen if the units were sent individually. +

+ +When receiving from multiple slots, the keys are checked in order, which can +be used for making priority queues. +

+ +
+ linda_h:set( key, [val] ) +
+ [val]= linda_h:get( key ) +
+ +

+The table access methods are for accessing a slot without queuing or consuming. +They can be used for making shared tables of storage among the lanes. +

+Writing to a slot overwrites existing value, and clears any possible queued +entries. Table access and send/receive can be used together; +reading a slot essentially peeks the next outcoming value of a queue. +

+ + + + +

Granularity of using Lindas

+ +

A single Linda object provides an infinite number of slots, so why would +you want to use several? +

There are some important reasons: + +

Access control. If you don't trust certain code completely, or just + to modularize your design, use one Linda for one usage and another one + for the other. This keeps your code clear and readable. You can pass + multiple Linda handles to a lane with practically no added cost. +
Namespace control. Linda keys have a "flat" namespace, so collisions + are possible if you try to use the same Linda for too many separate uses. +
Performance. Changing any slot in a Linda causes all pending threads + for that Linda to be momentarily awakened (at least in the C level). + This can degrade performance due to unnecessary OS level context switches. +

+ +On the other side, you need to use a common Linda for waiting for multiple +keys. You cannot wait for keys from two separate Linda objects at the same +time. +

+Actually, you can. Make separate lanes to wait each, and then multiplex those +events to a common Linda, but... :). +

+ + + +

Timers

+ +

+ = lanes.timer( linda_h, key, date_tbl|first_secs [,period_secs] ) +

+ +

+Timers can be run once, or in a reoccurring fashion (period_secs > 0). +The first occurrence can be given either as a date or as a relative delay in seconds. +The date table is like what os.date("*t") returns, in the +local time zone. +

+Once a timer expires, the key is set with the current time +(in seconds, same offset as os.time() but with millisecond accuracy). +The key can be waited upon using the regular Linda :receive() +method. +

+A timer can be stopped simply by first_secs=0 and no period. +

+ +
+
+ require "lanes" + + local linda= lanes.linda() + + -- First timer once a second, not synchronized to wall clock + -- + lanes.timer( linda, "sec", 1, 1 ) + + -- Timer to a future event (next even minute); wall clock synchronized + -- + local t= os.date( "*t", os.time()+60 ) -- now + 1min + t.sec= 0 + + lanes.timer( linda, "min", t, 60 ) -- reoccur every minute (sharp) + + while true do + local v,key= linda:receive( "sec", "min" ) + print( "Timer "..key..": "..v ) + end +
+
+ +

+NOTE: Timer keys are set, not queued, so missing a beat is possible especially +if the timer cycle is extremely small. The key value can be used to know the +actual time passed. +

+ + + + +
Design note: + +Having the API as lanes.timer() is intentional. Another +alternative would be linda_h:timer() but timers are not traditionally +seen to be part of Lindas. Also, it would mean any lane getting a Linda handle +would be able to modify timers on it. A third choice could +be abstracting the timers out of Linda realm altogether (timer_h= lanes.timer( date|first_secs, period_secs )) +but that would mean separate waiting functions for timers, and lindas. Even if +a linda object and key was returned, that key couldn't be waited upon simultaneously +with one's general linda events. +The current system gives maximum capabilities with minimum API, and any smoothenings +can easily be crafted in Lua at the application level. + +
+

+ + + +

Locks etc.

+ +

+Lanes does not generally require locks or critical sections to be used, at all. +If necessary, a limited queue can be used to emulate them. lanes.lua +offers some sugar to make it easy: +

+ +
+ lock_func= lanes.genlock( linda_h, key [,N_uint=1] ) + + lock_func( M_uint ) -- acquire + .. + lock_func( -M_uint ) -- release +
+

+ +The generated function acquires M entries from the N available, or releases +them if the value is negative. The acquiring call will suspend the lane, if necessary. +Use M=N=1 for a critical section lock (only one lane allowed to enter). +

+ +Note: The locks generated are not recursive. That would need another +kind of generator, which is currently not implemented. +

+ +Similar sugar exists for atomic counters: +

+ +
+ atomic_func= lanes.genatomic( linda_h, key [,initial_num=0.0] ) + + new_num= atomic_func( [diff_num=+1.0] ) +
+

+ +Each time called, the generated function will change linda[key] +atomically, without other lanes being able to interfere. The new value is +returned. You can use either diff 0.0 or get to just read the current +value. +

+ +Note that the generated functions can be passed on to other lanes. +

+ + + +

Other issues

+ +

Limitations on data passing

+ +

Data passed between lanes (either as starting parameters, return values, upvalues or via Lindas) must conform to the following: +

Booleans, numbers, strings, light userdata, Lua functions and tables of such can always be passed. +
Cyclic tables and/or duplicate references are allowed and reproduced appropriately, + but only within the same transmission. +
- using the same source table in multiple Linda messages keeps no ties between the tables +
+
Objects (tables with a metatable) are copyable between lanes. +
- metatables are assumed to be immutable; they are internally indexed and only copied once + per each type of objects per lane +
+
C functions (lua_CFunction) referring to LUA_ENVIRONINDEX or LUA_REGISTRYINDEX might not + work right in the target +
- rather completely re-initialize a module with require in the target lane +
+
Full userdata can be passed only if it's prepared using the deep userdata + system, which handles its lifespan management +
- in particular, lane handles cannot be passed between lanes +
+
coroutines cannot be passed +

+ + +

Required of module makers

+ +

+Most Lua extension modules should work unaltered with Lanes. +If the module simply ties C side features to Lua, everything is fine without +alterations. The luaopen_...() entry point will be called separately for each +lane, where the module is require'd from. +

+If it, however, also does one-time C side initializations, these +should be covered into a one-time-only construct such as below. +

+ +
+ +
+ int luaopen_module( lua_State *L ) + { + static char been_here; /* 0 by ANSI C */ + + /* Calls to 'require' serialized by Lanes; this is safe. + */ + if (!been_here) { + been_here= 1; + ... one time initializations ... + } + + ... binding to Lua ... + } +
+
+

+ + +

Deep userdata in your own apps

+ +

+The mechanism Lanes uses for sharing Linda handles between separate Lua states +can be used for custom userdata as well. Here's what to do. +

Provide an identity function for your userdata, in C. This function is +used for creation and deletion of your deep userdata (the shared resource), +and for making metatables for the state-specific proxies for accessing it. +Take a look at linda_id in lanes.c. +
Create your userdata using luaG_deep_userdata(), which is + a Lua-callable function. Given an idfunc, it sets up the support + structures and returns a state-specific proxy userdata for accessing your + data. This proxy can also be copied over to other lanes. +
Accessing the deep userdata from your C code, use luaG_todeep() + instead of the regular lua_touserdata(). +

+ +

Deep userdata management will take care of tying to __gc methods, +and doing reference counting to see how many proxies are still there for +accessing the data. Once there are none, the data will be freed through a call +to the idfunc you provided. +

NOTE: The lifespan of deep userdata may exceed that of the Lua state +that created it. The allocation of the data storage should not be tied to +the Lua state used. In other words, use malloc/free or +similar memory handling mechanism. +

+ + +

Lane handles don't travel

+ +

+Lane handles are not implemented as deep userdata, and cannot thus be +copied across lanes. This is intentional; problems would occur at least when +multiple lanes were to wait upon one to get ready. Also, it is a matter of +design simplicity. +

+The same benefits can be achieved by having a single worker lane spawn all +the sublanes, and keep track of them. Communications to and from this lane +can be handled via a Linda. +

+ + +

Beware with print and file output

+ +

+In multithreaded scenarios, giving multiple parameters to print() +or file:write() may cause them to be overlapped in the output, +something like this: + +

+  A:  print( 1, 2, 3, 4 )
+  B:  print( 'a', 'b', 'c', 'd' )
+  
+  1   a   b   2   3   c   d   4
+

+ +Lanes does not protect you from this behaviour. The thing to do is either to +concentrate your output to a certain lane per stream, or to concatenate output +into a single string before you call the output function. +

+ + +

Performance considerations

+ +

+Lanes is about making multithreading easy, and natural in the Lua state of mind. +Expect performance not to be an issue, if your program is logically built. +Here are some things one should consider, if best performance is vital: +

Data passing (parameters, upvalues, Linda messages) is generally fast, + doing two binary state-to-state copies (from source state to hidden state, + hidden state to target state). Remember that not only the function you + specify but also its upvalues, their upvalues, etc. etc. will get copied. +
Lane startup is fast (1000's of lanes a second), depending on the + number of standard libraries initialized. Initializing all standard libraries + is about 3-4 times slower than having no standard libraries at all. If you + throw in a lot of lanes per second, make sure you give them minimal necessary + set of libraries. +
Waiting Lindas are woken up (and execute some hidden Lua code) each + time any key in the Lindas they are waiting for are changed. This + may give essential slow-down (not measured, just a gut feeling) if a lot + of Linda keys are used. Using separate Linda objects for logically separate + issues will help (which is good practise anyhow). +
Linda objects are light. The memory footprint is two OS-level signalling + objects (HANDLE or pthread_cond_t) for each, plus one + C pointer for the proxies per each Lua state using the Linda. Barely nothing. +
Timers are light. You can probably expect timers up to 0.01 second + resolution to be useful, but that is very system specific. All timers are + merged into one main timer state (see timer.lua); no OS side + timers are utilized. +
Lindas are hashed to a fixed number of "keeper states", which are a locking entity. + If you are using a lot of Linda objects, + it may be useful to try having more of these keeper states. By default, + only one is used (see KEEPER_STATES_N), but this is an implementation detail. +

+ + +

Cancelling cancel

+ +

+Cancellation of lanes uses the Lua error mechanism with a special lightuserdata +error sentinel. +If you use pcall in code that needs to be cancellable +from the outside, the special error might not get through to Lanes, thus +preventing the Lane from being cleanly cancelled. You should throw any +lightuserdata error further. +

+This system can actually be used by application to detect cancel, do your own +cancellation duties, and pass on the error so Lanes will get it. If it does +not get a clean cancellation from a lane in due time, +it may forcefully kill the lane. +

+The sentinel is exposed as lanes.cancel_error, if you wish to use +its actual value. +

+ + + + +

Change log

+ +

+Jan-2009 (2.0.3): +

Added 'finalizer' to lane options. (TBD: not implemented yet!) +
Added call stack to errors coming from a lane. +

+ +Jul-2008 (2.0): +

Too many changes to list (you'll need to re-read this manual) +

+ + +

+ +

For feedback, questions and suggestions: +

Lanes @ LuaForge
the author

+ +

+ + + -- cgit v1.2.3-55-g6feb

	(nothing)	no standard libraries (default)
	`"base"` or `""`	root level names, `print`, `assert`, `unpack` etc.
	`"coroutine"`	`coroutine.*` namespace (part of base in Lua 5.1)
	`"debug"`	`debug.*` namespace
	`"io"`	`io.*` namespace
	`"math"`	`math.*` namespace
	`"os"`	`os.*` namespace
	`"package"`	`package.*` namespace and `require`
	`"string"`	`string.*` namespace
	`"table"`	`table.*` namespace
	`"*"`	all standard libraries

	`"pending"`	not started, yet
	`"running"`	running
	`"waiting"`	waiting at a Linda `:receive()` or `:send()`
	`"done"`	finished executing (results are ready)
	`"error"`	met an error (reading results will propagate it)
	`"cancelled"`	received cancellation and finished itself