From 3fe8732f32e3675c62a46fcb70fb95ef936376a5 Mon Sep 17 00:00:00 2001
From: Peter Drahoš This document was revised on 23-Jan-09, and applies to version 2.0.3.
+ Lua Lanes is a Lua extension library providing
+ the possibility to run multiple Lua states in parallel. It is intended to
+ be used for optimizing performance on multicore CPU's and to study ways to make Lua programs naturally parallel to begin with.
+
+ Lanes is included into your software by the regular
+ require "lanes" method. No C side programming is needed; all APIs are Lua side, and most existing extension modules should
+ work seamlessly together with the multiple lanes.
+
+ See comparison of Lua Lanes with other Lua multithreading solutions.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Lua Lanes - multithreading in Lua
+
Copyright © 2007-08 Asko Kauppi. All rights reserved.
+
Lua Lanes is published under the same MIT license as Lua 5.1.
+
+Description
+
+Features:
+
+
+
+
+
+Limitations:
+
+
+
Lua Lanes supports the following operating systems: + +
The underlying threading code can be compiled either towards Win32 API + or Pthreads. Unfortunately, thread prioritation under Pthreads is a JOKE, + requiring OS specific tweaks and guessing undocumented behaviour. Other + features should be portable to any modern platform. +
+ + + + +Lua Lanes is built simply by make on the supported platforms +(make-vc for Visual C++). See README for system specific +details and limitations. +
+ +To install Lanes, all you need are the lanes.lua and lua51-lanes.so|dll +files to be reachable by Lua (see LUA_PATH, LUA_CPATH). + +Or use Lua Rocks package management. +
+ ++ > luarocks search lanes + ... output listing Lua Lanes is there ... + + > luarocks install lanes + ... output ... ++ + + +
The following sample shows preparing a function for parallel calling, and +calling it with varying arguments. Each of the two results is calculated in +a separate OS thread, parallel to the calling one. Reading the results +joins the threads, waiting for any results not already there. +
+ +
++ require "lanes" + + f= lanes.gen( function(n) return 2*n end ) + a= f(1) + b= f(2) + + print( a[1], b[1] ) -- 2 4 ++ |
+
+ func= lanes.gen( [libs_str | opt_tbl [, ...],] lane_func )
+
+ |
+ The function returned by lanes.gen() is a "generator" for + launching any number of lanes. They will share code, options, initial globals, + but the particular arguments may vary. Only calling the generator function + actually launches a lane, and provides a handle for controlling it. + +
+Lanes automatically copies upvalues over to the new lanes, so you +need not wrap all the required elements into one 'wrapper' function. If +lane_func uses some local values, or local functions, they will be there +also in the new lanes. +
+ libs_str
defines the standard libraries made available to the
+ new Lua state:
+
(nothing) | no standard libraries (default) | |
"base" or "" | +root level names, print, assert, unpack etc. | |
"coroutine" | coroutine.* namespace (part of base in Lua 5.1) | |
"debug" | debug.* namespace | |
"io" | io.* namespace | |
"math" | math.* namespace | |
"os" | os.* namespace | |
"package" | package.* namespace and require | |
"string" | string.* namespace | |
"table" | table.* namespace | |
"*" | all standard libraries |
+ Initializing the standard libs takes a bit of time at each lane invocation. + This is the main reason why "no libraries" is the default. +
+
+ opt_tbl
is a collection of named options to control the way
+ lanes are run:
+
+
+ .cancelstep |
+ + By default, lanes are only cancellable when they enter a pending + :receive() or :send() call. + With this option, one can set cancellation check to occur every N + Lua statements. The value true uses a default value (100). + | |
+ .globals globals_tbl |
+
+ Sets the globals table for the launched threads. This can be used for giving
+ them constants.
+ + The global values of different lanes are in no manner connected; + modifying one will only affect the particular lane. + | |
+ .priority |
+ The priority of lanes generated. -2 is lowest, +2 is highest.
+ + Implementation and dependability of priorities varies + by platform. Especially Linux kernel 2.6 is not supporting priorities in user mode. + |
+The lane handles are allowed to be 'let loose'; in other words you may execute +a lane simply by: + +
+ lanes.gen( function() ... end ) () ++ +Normally, this kind of lanes will be in an eternal loop handling messages. +Since the lane handle is gone, +there is no way to control such a lane from the outside, nor read its potential +return values. Then again, such a lane does not even normally return. + + + + +
+ str= lane_h.status
+ |
The current execution state of a lane can be read via its status +member, providing one of these values: (2 + +
"pending" | not started, yet | |
"running" | running | |
"waiting" | waiting at a Linda :receive() or :send() | |
"done" | finished executing (results are ready) | |
"error" | met an error (reading results will propagate it) | |
"cancelled" | received cancellation and finished itself |
+ This is similar to coroutine.status, which has: "running" / + "suspended" / "normal" / "dead". Not using the + exact same names is intentional. +
+ + + +A lane can be waited upon by simply reading its results. This can be done +in two ways. +
+ +
+ [val]= lane_h[1]
+ |
+Makes sure lane has finished, and gives its first (maybe only) return value. +Other return values will be available in other lane_h indices. +
+If the lane ended in an error, it is propagated to master state at this place. +
+ +
+ [...]|[nil,err,stack_tbl]= lane_h:join( [timeout_secs] )
+ |
+Waits until the lane finishes, or timeout seconds have passed. +Returns nil on timeout, nil,err,stack_tbl if the lane hit an error, +or the return values of the lane. Unlike in reading the results in table +fashion, errors are not propagated. +
+stack_tbl is an array of "<filename>:<line>" strings, +describing where the error was thrown. Use table.concat() to format +it to your liking (or just ignore it). +
+If you use :join, make sure your lane main function returns +a non-nil value so you can tell timeout and error cases apart from succesful +return (using the .status property may be risky, since it might change +between a timed out join and the moment you read it). +
+ +
++ require "lanes" + + f= lanes.gen( function() error "!!!" end ) + a= f(1) + + --print( a[1] ) -- propagates error + + v,err= a:join() -- no propagation + if v==nil then + error( "'a' faced error"..tostring(err) ) -- manual propagation + end ++ |
+ bool= lane_h:cancel( [timeout_secs=0.0,] [force_kill_bool=false] )
+ |
Sends a cancellation request to the lane. If timeout_secs is non-zero, waits +for the request to be processed, or a timeout to occur. +Returns true if the lane was already done (in "done", "error" or "cancelled" status) +or if the cancellation was fruitful within timeout period. +
+If the lane is still running and force_kill is true, the +OS thread running the lane is forcefully killed. This means no GC, and should +generally be the last resort. +
+Cancellation is tested before going to sleep in receive() or send() calls +and after executing cancelstep Lua statements. A currently pending receive +or send call is currently not awakened, and may be a reason for a non-detected cancel. +
+ + + +
+ set_finalizer( finalizer_func )
+ + void= finalizer_func( [error] )
+ |
The error call is used for throwing exceptions in Lua. What Lua +does not offer, however, is scoped finalizers +that would get called when a certain block of instructions gets exited, whether +through peaceful return or abrupt error. +
+Since 2.0.3, Lanes prepares a function set_finalizer for doing this. +Any functions given to it will be called in the lane Lua state, just prior to +closing it. They are not called in any particular order. +
+An error in a finalizer itself overrides the state of the regular chunk +(in practise, it would be highly preferable not to have errors in finalizers). +If one finalizer errors, the others may not get called. +
+ + + +Communications between lanes is completely detached from the lane handles +themselves. By itself, a lane can only provide return values once it's finished, +or throw an error. Needs to communicate during runtime are handled by Linda objects, which are +deep userdata instances. They can be provided to a lane +as startup parameters, upvalues or in some other Linda's message. +
+Access to a Linda object means a lane can read or write to any of its data +slots. Multiple lanes can be accessing the same Linda in parallel. No application +level locking is required; each Linda operation is atomic. +
+ +
++ require "lanes" + + local linda= lanes.linda() + + local function loop( max ) + for i=1,max do + print( "sending: "..i ) + linda:send( "x", i ) -- linda as upvalue + end + end + + a= lanes.gen("",loop)( 10000 ) + + while true do + local val= linda:receive( 3.0, "x" ) -- timeout in seconds + if val==nil then + print( "timed out" ) + break + end + print( "received: "..val ) + end ++ |
Characteristics of the Lanes implementation of Lindas are: + +
+
+ h= lanes.linda()
+ + bool= h:send( [timeout_secs,] key, ... )
+ + [val, key]= h:receive( [timeout_secs,] key [, ...] )
+ + = h:limit( key, n_uint )
+ |
The send and receive methods use Linda keys as FIFO stacks +(first in, first out). Timeouts are given in seconds (millisecond accuracy). +If using numbers as the first Linda key, one must explicitly give nil +as the timeout parameter to avoid ambiguities. +
+By default, stack sizes are unlimited but limits can be +enforced using the limit method. This can be useful to balance execution +speeds in a producer/consumer scenario. +
+Note that any number of lanes can be reading or writing a Linda. There can be +many producers, and many consumers. It's up to you. +
+send returns true if the sending succeeded, and false +if the queue limit was met, and the queue did not empty enough during the given +timeout. +
+Equally, receive returns a value and the key that provided the value, +or nothing for timeout. Note that nils can be sent and received; +the key value will tell it apart from a timeout. +
+Multiple values can be sent to a given key at once, atomically (the send will +fail unless all the values fit within the queue limit). This can be useful for +multiple producer scenarios, if the protocols used are giving data in streams +of multiple units. Atomicity avoids the producers from garbling each others +messages, which could happen if the units were sent individually. +
+ +When receiving from multiple slots, the keys are checked in order, which can +be used for making priority queues. +
+ +
+ linda_h:set( key, [val] )
+ + [val]= linda_h:get( key )
+ |
+The table access methods are for accessing a slot without queuing or consuming. +They can be used for making shared tables of storage among the lanes. +
+Writing to a slot overwrites existing value, and clears any possible queued +entries. Table access and send/receive can be used together; +reading a slot essentially peeks the next outcoming value of a queue. +
+ + + + +A single Linda object provides an infinite number of slots, so why would +you want to use several? +
There are some important reasons: + +
+Actually, you can. Make separate lanes to wait each, and then multiplex those +events to a common Linda, but... :). +
+ + + +
+ = lanes.timer( linda_h, key, date_tbl|first_secs [,period_secs] )
+ |
+Timers can be run once, or in a reoccurring fashion (period_secs > 0). +The first occurrence can be given either as a date or as a relative delay in seconds. +The date table is like what os.date("*t") returns, in the +local time zone. +
+Once a timer expires, the key is set with the current time +(in seconds, same offset as os.time() but with millisecond accuracy). +The key can be waited upon using the regular Linda :receive() +method. +
+A timer can be stopped simply by first_secs=0 and no period. +
+ +
++ require "lanes" + + local linda= lanes.linda() + + -- First timer once a second, not synchronized to wall clock + -- + lanes.timer( linda, "sec", 1, 1 ) + + -- Timer to a future event (next even minute); wall clock synchronized + -- + local t= os.date( "*t", os.time()+60 ) -- now + 1min + t.sec= 0 + + lanes.timer( linda, "min", t, 60 ) -- reoccur every minute (sharp) + + while true do + local v,key= linda:receive( "sec", "min" ) + print( "Timer "..key..": "..v ) + end ++ |
+NOTE: Timer keys are set, not queued, so missing a beat is possible especially +if the timer cycle is extremely small. The key value can be used to know the +actual time passed. +
+
+ +Having the API as lanes.timer() is intentional. Another +alternative would be linda_h:timer() but timers are not traditionally +seen to be part of Lindas. Also, it would mean any lane getting a Linda handle +would be able to modify timers on it. A third choice could +be abstracting the timers out of Linda realm altogether (timer_h= lanes.timer( date|first_secs, period_secs )) +but that would mean separate waiting functions for timers, and lindas. Even if +a linda object and key was returned, that key couldn't be waited upon simultaneously +with one's general linda events. +The current system gives maximum capabilities with minimum API, and any smoothenings +can easily be crafted in Lua at the application level. + + | +
+Lanes does not generally require locks or critical sections to be used, at all. +If necessary, a limited queue can be used to emulate them. lanes.lua +offers some sugar to make it easy: +
+ +
+ lock_func= lanes.genlock( linda_h, key [,N_uint=1] ) + + lock_func( M_uint ) -- acquire + .. + lock_func( -M_uint ) -- release + |
+ +The generated function acquires M entries from the N available, or releases +them if the value is negative. The acquiring call will suspend the lane, if necessary. +Use M=N=1 for a critical section lock (only one lane allowed to enter). +
+ +Note: The locks generated are not recursive. That would need another +kind of generator, which is currently not implemented. +
+ +Similar sugar exists for atomic counters: +
+ +
+ atomic_func= lanes.genatomic( linda_h, key [,initial_num=0.0] ) + + new_num= atomic_func( [diff_num=+1.0] ) + |
+ +Each time called, the generated function will change linda[key] +atomically, without other lanes being able to interfere. The new value is +returned. You can use either diff 0.0 or get to just read the current +value. +
+ +Note that the generated functions can be passed on to other lanes. +
+ + + +Data passed between lanes (either as starting parameters, return values, upvalues or via Lindas) must conform to the following: +
++Most Lua extension modules should work unaltered with Lanes. +If the module simply ties C side features to Lua, everything is fine without +alterations. The luaopen_...() entry point will be called separately for each +lane, where the module is require'd from. +
+If it, however, also does one-time C side initializations, these +should be covered into a one-time-only construct such as below. +
+ +
+ |
++ int luaopen_module( lua_State *L ) + { + static char been_here; /* 0 by ANSI C */ + + /* Calls to 'require' serialized by Lanes; this is safe. + */ + if (!been_here) { + been_here= 1; + ... one time initializations ... + } + + ... binding to Lua ... + } ++ |
+The mechanism Lanes uses for sharing Linda handles between separate Lua states +can be used for custom userdata as well. Here's what to do. +
+Deep userdata management will take care of tying to __gc methods, +and doing reference counting to see how many proxies are still there for +accessing the data. Once there are none, the data will be freed through a call +to the idfunc you provided. +
+NOTE: The lifespan of deep userdata may exceed that of the Lua state +that created it. The allocation of the data storage should not be tied to +the Lua state used. In other words, use malloc/free or +similar memory handling mechanism. +
+ + ++Lane handles are not implemented as deep userdata, and cannot thus be +copied across lanes. This is intentional; problems would occur at least when +multiple lanes were to wait upon one to get ready. Also, it is a matter of +design simplicity. +
+The same benefits can be achieved by having a single worker lane spawn all +the sublanes, and keep track of them. Communications to and from this lane +can be handled via a Linda. +
+ + ++In multithreaded scenarios, giving multiple parameters to print() +or file:write() may cause them to be overlapped in the output, +something like this: + +
+ A: print( 1, 2, 3, 4 ) + B: print( 'a', 'b', 'c', 'd' ) + + 1 a b 2 3 c d 4 ++ +Lanes does not protect you from this behaviour. The thing to do is either to +concentrate your output to a certain lane per stream, or to concatenate output +into a single string before you call the output function. + + + +
+Lanes is about making multithreading easy, and natural in the Lua state of mind. +Expect performance not to be an issue, if your program is logically built. +Here are some things one should consider, if best performance is vital: +
+
+Cancellation of lanes uses the Lua error mechanism with a special lightuserdata +error sentinel. +If you use pcall in code that needs to be cancellable +from the outside, the special error might not get through to Lanes, thus +preventing the Lane from being cleanly cancelled. You should throw any +lightuserdata error further. +
+This system can actually be used by application to detect cancel, do your own +cancellation duties, and pass on the error so Lanes will get it. If it does +not get a clean cancellation from a lane in due time, +it may forcefully kill the lane. +
+The sentinel is exposed as lanes.cancel_error, if you wish to use +its actual value. +
+ + + + ++Jan-2009 (2.0.3): +
For feedback, questions and suggestions: +