From 89d9c98af1ac352ba4d49d660e61b0853d6e1a86 Mon Sep 17 00:00:00 2001
From: Peter Drahoš
+A short comparison of Lanes with other existing Lua multithreading kits.
+ For feedback, questions and suggestions:
+
+Comparisons to other threading kits
+
+
+
+
+
+
+
+
+=============
+ Lua Lanes
+=============
+
+With the introduction of Lindas (Jun-2008), Lua Lanes simplifies its API while
+simultaneously adding more power and speed.
+
+Pros:
+ - regular Lua 5.1 module
+ - completely separate Lua states, one per OS thread
+ - message passing, or shared data using Lindas
+ - no application level locking, ever
+ - scales well, up to 1000's of threads
+ - priorities (-2..+2) for launched threads
+ - threads are cancellable (with complete cleanup)
+ - timeouts on all pending operations
+ - thread contents are given as regular Lua functions; syntax checked early on,
+ syntax highlighting works
+ - standard libraries opened to subthreads can be granually selected
+ - fast stack-to-stack copies, via hidden "keeper states". No serialization needed.
+ - protects calls to 'require', allowing wide compatibility with existing
+ modules (and all, with minor changes)
+
+Cons:
+ - requires OS threads
+ - not utilizing network parallelism (all threads on one CPU)
+
+Sample:
+<<
+ require "lanes"
+
+ local function calculate(a,b,c)
+ if not a then
+ error "sample error; propagated to main lane when reading results"
+ end
+ return a+b+c
+ end
+
+ local h1= lanes.new(calculate)(1,2,3)
+ local h2= lanes.new(calculate)(10,20,30)
+ local h3= lanes.new(calculate)(100,200,300)
+
+ print( h1[1], h2[1], h3[1] ) -- pends for the results, or propagates error
+<<
+
+
+==================
+ Lua coroutines (by Lua authors)
+==================
+
+http://www.lua.org/manual/5.1/manual.html#2.11
+http://lua-users.org/wiki/CoroutinesTutorial
+
+Lua coroutines is an integral part of Lua 5 itself. It is listed here, since
+it should be the _first_ multitasking mechanism to consider. It can also be
+used together with Lanes, or the other mechanisms listed below.
+
+Coroutines are very usable in provider/consumer situations, allowing you to
+queue Lua functions on an as-needed dataflow basis with each other.
+
+Pros:
+ - works with plain Lua (no extensions)
+ - works on any platform
+ - lightweight (no OS level threading or locking involved)
+
+Cons:
+ - co-operative, meaning your code will need to decide, who gets to run
+ - does not utilize multiple CPUs/cores
+
+Sample:
+
+ ..TBD: sample code doing the "child" "parent" output here (see below)..
+
+
+=============
+ LuaThread (by Diego Nehab)
+=============
+
+http://www.cs.princeton.edu/~diego/professional/luathread/
+
+LuaThread provides thread creation, mutexes, condition variables, and inter-
+thread queues to the Lua scripts. It takes a C-kind of approach, where Lua
+globals are shared by the threads running, and need therefore to be guarded
+against multithreading conflicts.
+
+Whether this is exactly what you want, or whether a more loosely implemented
+multithreading (s.a. Lanes) would be better, is up to you. One can argue that
+a loose implementation is easier for the developer, since no application level
+lockings need to be considered.
+
+Pros:
+ - no marshalling overhead, since threads share the same Lua state
+
+Cons:
+ - requires a modified Lua core
+ - application level locking required
+
+Sample:
+<<
+ local function flood(output, word)
+ while 1 do
+ output:lock()
+ io.write(word, ", ")
+ output:unlock()
+ end
+ end
+
+ local output = thread.newmutex()
+ thread.newthread(flood, {output, "child"})
+ flood(output, "parent")
+<<
+
+
+=============
+ Lua Rings (by Roberto Ierusalimschy & Tomás Guisasola)
+=============
+
+http://www.keplerproject.org/rings/
+
+".. library which provides a way to create new Lua states from within Lua.
+It also offers a simple way to communicate between the creator (master) and
+the created (slave) states."
+
+".. master can execute code in any of its slaves but each slave only has
+access to its master (or its own slaves)."
+
+Rings offers separate Lua states, but no multithreading. This makes it simple,
+but it won't use more than one CPU core. Other differences include:
+
+ - opens all Lua standard libraries for subthreads
+ (Lanes opens the needed ones)
+
+ - marshalls numbers, strings, booleans, userdata
+ (Lanes marshalls also non-cyclic tables)
+
+ - "remotedostring" allows executing code in the master state
+ (Lanes does _not_ allow subthreads to trouble/modify master automatically,
+ to allow effective sandboxing. The same can be achieved by sending code
+ between the threads, but master needs to explicitly allow this = receive
+ a function and execute it)
+
+ - offers "Stable, a very simple API to manage a shared table at the master
+ state"
+ (Lanes 2008 offers keeper tables)
+
+Pros:
+ - "offers Stable, a very simple API to manage a shared table at the master
+ state"
+
+Cons:
+ - thread contents defined as strings, not Lua source as such; does not
+ give syntax check at file parsing, does not allow syntax highlight
+
+Sample:
+<<
+ require"rings"
+ S = rings.new ()
+
+ data = { 12, 13, 14, }
+ print (S:dostring ([[
+ aux = {}
+ for i, v in ipairs (arg) do
+ table.insert (aux, 1, v)
+ end
+ return unpack (aux)]], unpack (data))) -- true, 14, 13, 12
+
+ S:close ()
+<<
+
+
+==========================
+ Helper Threads Toolkit (by Javier Guerra G.)
+==========================
+
+http://luaforge.net/projects/helper-threads/
+
+"Provides a consistent framework to write non-blocking C libraries, with a Lua
+interface for starting tasks and managing the Futures, Queues and Threads."
+
+This seems like a companion of the "occasional threading" model (see below);
+Lua side is kept clear of multithreading, while C side can be "spawn" off to
+do things on the background.
+
+Pros:
+ - provides an "update" mechanism, allowing the (C) thread and controlling
+ Lua to interact during execution of the thread
+ - ...
+
+Cons:
+ - thread is "only for C code and it can't touch or access the Lua state",
+ in other words there is no Lua-side multithreading concept (by design)
+
+
+========================
+ Occasional threading (by Russ Cox)
+========================
+
+http://lua-users.org/lists/lua-l/2006-11/msg00368.html
+
+".. able to have certain C calls run in parallel with the [Lua] VM, but
+otherwise keep the VM single-threaded."
+
+That pretty much says it all.
+
+Pros:
+ - simple, yet providing for the "occasional" need to run really multithreaded
+ - can be made into a loadable module (says the message)
+
+Cons:
+ - only for occasional usage, the programming paradigm is still essentially
+ singlethreaded (by definition)
+ - not a real project, just a message on the Lua list (but a good one!)
+
+
+=================
+ ConcurrentLua
+=================
+
+http://concurrentlua.luaforge.net/index.html
+
+ConcurrentLua is based on the Lua model for concurrency, namely coroutines, and
+extends this model by providing message-passing primitives.
+
+".. implementation of the share-nothing asynchronous message-passing model"
+
+".. process can check its mailbox for new messages at any time, and if there
+are any, they can be read in the order of arrival."
+
+".. processes in the system are implemented with Lua coroutines"
+
+".. still based on the cooperative multithreading model that Lua uses"
+
+Recent, released on 21 June 2008.
+
+Pros:
+ - From ground up designed for distributed computing (multiple computers,
+ not only CPU cores)
+ - Does not require a pre-emptive operating system
+
+Cons:
+ - Serialization must degrade raw performance in one-computer scenarios
+ (vs. stack-to-stack copying ala Lanes)
+ - Depends on LuaSocket and Copas modules.
+ - Each thread has a single mailbox tied to it (vs. separating threads and
+ connection resources)
+
+
+
+
+
+
+
+
+ ![]() ![]() ![]() ![]() ![]() |
+
Lua Lanes - multithreading in Lua+ |
+
Copyright © 2007-08 Asko Kauppi. All rights reserved.
+
Lua Lanes is published under the same MIT license as Lua 5.1.
+
This document was revised on 23-Jan-09, and applies to version 2.0.3. +
+ +Lua Lanes is a Lua extension library providing + the possibility to run multiple Lua states in parallel. It is intended to + be used for optimizing performance on multicore CPU's and to study ways to make Lua programs naturally parallel to begin with. +
+ Lanes is included into your software by the regular + require "lanes" method. No C side programming is needed; all APIs are Lua side, and most existing extension modules should + work seamlessly together with the multiple lanes. +
+ See comparison of Lua Lanes with other Lua multithreading solutions. +
+
Lua Lanes supports the following operating systems: + +
The underlying threading code can be compiled either towards Win32 API + or Pthreads. Unfortunately, thread prioritation under Pthreads is a JOKE, + requiring OS specific tweaks and guessing undocumented behaviour. Other + features should be portable to any modern platform. +
+ + + + +Lua Lanes is built simply by make on the supported platforms +(make-vc for Visual C++). See README for system specific +details and limitations. +
+ +To install Lanes, all you need are the lanes.lua and lua51-lanes.so|dll +files to be reachable by Lua (see LUA_PATH, LUA_CPATH). + +Or use Lua Rocks package management. +
+ ++ > luarocks search lanes + ... output listing Lua Lanes is there ... + + > luarocks install lanes + ... output ... ++ + + +
The following sample shows preparing a function for parallel calling, and +calling it with varying arguments. Each of the two results is calculated in +a separate OS thread, parallel to the calling one. Reading the results +joins the threads, waiting for any results not already there. +
+ +
++ require "lanes" + + f= lanes.gen( function(n) return 2*n end ) + a= f(1) + b= f(2) + + print( a[1], b[1] ) -- 2 4 ++ |
+
+ func= lanes.gen( [libs_str | opt_tbl [, ...],] lane_func )
+
+ |
+ The function returned by lanes.gen() is a "generator" for + launching any number of lanes. They will share code, options, initial globals, + but the particular arguments may vary. Only calling the generator function + actually launches a lane, and provides a handle for controlling it. + +
+Lanes automatically copies upvalues over to the new lanes, so you +need not wrap all the required elements into one 'wrapper' function. If +lane_func uses some local values, or local functions, they will be there +also in the new lanes. +
+ libs_str
defines the standard libraries made available to the
+ new Lua state:
+
(nothing) | no standard libraries (default) | |
"base" or "" | +root level names, print, assert, unpack etc. | |
"coroutine" | coroutine.* namespace (part of base in Lua 5.1) | |
"debug" | debug.* namespace | |
"io" | io.* namespace | |
"math" | math.* namespace | |
"os" | os.* namespace | |
"package" | package.* namespace and require | |
"string" | string.* namespace | |
"table" | table.* namespace | |
"*" | all standard libraries |
+ Initializing the standard libs takes a bit of time at each lane invocation. + This is the main reason why "no libraries" is the default. +
+
+ opt_tbl
is a collection of named options to control the way
+ lanes are run:
+
+
+ .cancelstep |
+ + By default, lanes are only cancellable when they enter a pending + :receive() or :send() call. + With this option, one can set cancellation check to occur every N + Lua statements. The value true uses a default value (100). + | |
+ .globals globals_tbl |
+
+ Sets the globals table for the launched threads. This can be used for giving
+ them constants.
+ + The global values of different lanes are in no manner connected; + modifying one will only affect the particular lane. + | |
+ .priority |
+ The priority of lanes generated. -2 is lowest, +2 is highest.
+ + Implementation and dependability of priorities varies + by platform. Especially Linux kernel 2.6 is not supporting priorities in user mode. + |
+The lane handles are allowed to be 'let loose'; in other words you may execute +a lane simply by: + +
+ lanes.gen( function() ... end ) () ++ +Normally, this kind of lanes will be in an eternal loop handling messages. +Since the lane handle is gone, +there is no way to control such a lane from the outside, nor read its potential +return values. Then again, such a lane does not even normally return. + + + + +
+ str= lane_h.status
+ |
The current execution state of a lane can be read via its status +member, providing one of these values: (2 + +
"pending" | not started, yet | |
"running" | running | |
"waiting" | waiting at a Linda :receive() or :send() | |
"done" | finished executing (results are ready) | |
"error" | met an error (reading results will propagate it) | |
"cancelled" | received cancellation and finished itself |
+ This is similar to coroutine.status, which has: "running" / + "suspended" / "normal" / "dead". Not using the + exact same names is intentional. +
+ + + +A lane can be waited upon by simply reading its results. This can be done +in two ways. +
+ +
+ [val]= lane_h[1]
+ |
+Makes sure lane has finished, and gives its first (maybe only) return value. +Other return values will be available in other lane_h indices. +
+If the lane ended in an error, it is propagated to master state at this place. +
+ +
+ [...]|[nil,err,stack_tbl]= lane_h:join( [timeout_secs] )
+ |
+Waits until the lane finishes, or timeout seconds have passed. +Returns nil on timeout, nil,err,stack_tbl if the lane hit an error, +or the return values of the lane. Unlike in reading the results in table +fashion, errors are not propagated. +
+stack_tbl is an array of "<filename>:<line>" strings, +describing where the error was thrown. Use table.concat() to format +it to your liking (or just ignore it). +
+If you use :join, make sure your lane main function returns +a non-nil value so you can tell timeout and error cases apart from succesful +return (using the .status property may be risky, since it might change +between a timed out join and the moment you read it). +
+ +
++ require "lanes" + + f= lanes.gen( function() error "!!!" end ) + a= f(1) + + --print( a[1] ) -- propagates error + + v,err= a:join() -- no propagation + if v==nil then + error( "'a' faced error"..tostring(err) ) -- manual propagation + end ++ |
+ bool= lane_h:cancel( [timeout_secs=0.0,] [force_kill_bool=false] )
+ |
Sends a cancellation request to the lane. If timeout_secs is non-zero, waits +for the request to be processed, or a timeout to occur. +Returns true if the lane was already done (in "done", "error" or "cancelled" status) +or if the cancellation was fruitful within timeout period. +
+If the lane is still running and force_kill is true, the +OS thread running the lane is forcefully killed. This means no GC, and should +generally be the last resort. +
+Cancellation is tested before going to sleep in receive() or send() calls +and after executing cancelstep Lua statements. A currently pending receive +or send call is currently not awakened, and may be a reason for a non-detected cancel. +
+ + + +
+ set_finalizer( finalizer_func )
+ + void= finalizer_func( [error] )
+ |
The error call is used for throwing exceptions in Lua. What Lua +does not offer, however, is scoped finalizers +that would get called when a certain block of instructions gets exited, whether +through peaceful return or abrupt error. +
+Since 2.0.3, Lanes prepares a function set_finalizer for doing this. +Any functions given to it will be called in the lane Lua state, just prior to +closing it. They are not called in any particular order. +
+An error in a finalizer itself overrides the state of the regular chunk +(in practise, it would be highly preferable not to have errors in finalizers). +If one finalizer errors, the others may not get called. +
+ + + +Communications between lanes is completely detached from the lane handles +themselves. By itself, a lane can only provide return values once it's finished, +or throw an error. Needs to communicate during runtime are handled by Linda objects, which are +deep userdata instances. They can be provided to a lane +as startup parameters, upvalues or in some other Linda's message. +
+Access to a Linda object means a lane can read or write to any of its data +slots. Multiple lanes can be accessing the same Linda in parallel. No application +level locking is required; each Linda operation is atomic. +
+ +
++ require "lanes" + + local linda= lanes.linda() + + local function loop( max ) + for i=1,max do + print( "sending: "..i ) + linda:send( "x", i ) -- linda as upvalue + end + end + + a= lanes.gen("",loop)( 10000 ) + + while true do + local val= linda:receive( 3.0, "x" ) -- timeout in seconds + if val==nil then + print( "timed out" ) + break + end + print( "received: "..val ) + end ++ |
Characteristics of the Lanes implementation of Lindas are: + +
+
+ h= lanes.linda()
+ + bool= h:send( [timeout_secs,] key, ... )
+ + [val, key]= h:receive( [timeout_secs,] key [, ...] )
+ + = h:limit( key, n_uint )
+ |
The send and receive methods use Linda keys as FIFO stacks +(first in, first out). Timeouts are given in seconds (millisecond accuracy). +If using numbers as the first Linda key, one must explicitly give nil +as the timeout parameter to avoid ambiguities. +
+By default, stack sizes are unlimited but limits can be +enforced using the limit method. This can be useful to balance execution +speeds in a producer/consumer scenario. +
+Note that any number of lanes can be reading or writing a Linda. There can be +many producers, and many consumers. It's up to you. +
+send returns true if the sending succeeded, and false +if the queue limit was met, and the queue did not empty enough during the given +timeout. +
+Equally, receive returns a value and the key that provided the value, +or nothing for timeout. Note that nils can be sent and received; +the key value will tell it apart from a timeout. +
+Multiple values can be sent to a given key at once, atomically (the send will +fail unless all the values fit within the queue limit). This can be useful for +multiple producer scenarios, if the protocols used are giving data in streams +of multiple units. Atomicity avoids the producers from garbling each others +messages, which could happen if the units were sent individually. +
+ +When receiving from multiple slots, the keys are checked in order, which can +be used for making priority queues. +
+ +
+ linda_h:set( key, [val] )
+ + [val]= linda_h:get( key )
+ |
+The table access methods are for accessing a slot without queuing or consuming. +They can be used for making shared tables of storage among the lanes. +
+Writing to a slot overwrites existing value, and clears any possible queued +entries. Table access and send/receive can be used together; +reading a slot essentially peeks the next outcoming value of a queue. +
+ + + + +A single Linda object provides an infinite number of slots, so why would +you want to use several? +
There are some important reasons: + +
+Actually, you can. Make separate lanes to wait each, and then multiplex those +events to a common Linda, but... :). +
+ + + +
+ = lanes.timer( linda_h, key, date_tbl|first_secs [,period_secs] )
+ |
+Timers can be run once, or in a reoccurring fashion (period_secs > 0). +The first occurrence can be given either as a date or as a relative delay in seconds. +The date table is like what os.date("*t") returns, in the +local time zone. +
+Once a timer expires, the key is set with the current time +(in seconds, same offset as os.time() but with millisecond accuracy). +The key can be waited upon using the regular Linda :receive() +method. +
+A timer can be stopped simply by first_secs=0 and no period. +
+ +
++ require "lanes" + + local linda= lanes.linda() + + -- First timer once a second, not synchronized to wall clock + -- + lanes.timer( linda, "sec", 1, 1 ) + + -- Timer to a future event (next even minute); wall clock synchronized + -- + local t= os.date( "*t", os.time()+60 ) -- now + 1min + t.sec= 0 + + lanes.timer( linda, "min", t, 60 ) -- reoccur every minute (sharp) + + while true do + local v,key= linda:receive( "sec", "min" ) + print( "Timer "..key..": "..v ) + end ++ |
+NOTE: Timer keys are set, not queued, so missing a beat is possible especially +if the timer cycle is extremely small. The key value can be used to know the +actual time passed. +
+
+ +Having the API as lanes.timer() is intentional. Another +alternative would be linda_h:timer() but timers are not traditionally +seen to be part of Lindas. Also, it would mean any lane getting a Linda handle +would be able to modify timers on it. A third choice could +be abstracting the timers out of Linda realm altogether (timer_h= lanes.timer( date|first_secs, period_secs )) +but that would mean separate waiting functions for timers, and lindas. Even if +a linda object and key was returned, that key couldn't be waited upon simultaneously +with one's general linda events. +The current system gives maximum capabilities with minimum API, and any smoothenings +can easily be crafted in Lua at the application level. + + | +
+Lanes does not generally require locks or critical sections to be used, at all. +If necessary, a limited queue can be used to emulate them. lanes.lua +offers some sugar to make it easy: +
+ +
+ lock_func= lanes.genlock( linda_h, key [,N_uint=1] ) + + lock_func( M_uint ) -- acquire + .. + lock_func( -M_uint ) -- release + |
+ +The generated function acquires M entries from the N available, or releases +them if the value is negative. The acquiring call will suspend the lane, if necessary. +Use M=N=1 for a critical section lock (only one lane allowed to enter). +
+ +Note: The locks generated are not recursive. That would need another +kind of generator, which is currently not implemented. +
+ +Similar sugar exists for atomic counters: +
+ +
+ atomic_func= lanes.genatomic( linda_h, key [,initial_num=0.0] ) + + new_num= atomic_func( [diff_num=+1.0] ) + |
+ +Each time called, the generated function will change linda[key] +atomically, without other lanes being able to interfere. The new value is +returned. You can use either diff 0.0 or get to just read the current +value. +
+ +Note that the generated functions can be passed on to other lanes. +
+ + + +Data passed between lanes (either as starting parameters, return values, upvalues or via Lindas) must conform to the following: +
++Most Lua extension modules should work unaltered with Lanes. +If the module simply ties C side features to Lua, everything is fine without +alterations. The luaopen_...() entry point will be called separately for each +lane, where the module is require'd from. +
+If it, however, also does one-time C side initializations, these +should be covered into a one-time-only construct such as below. +
+ +
+ |
++ int luaopen_module( lua_State *L ) + { + static char been_here; /* 0 by ANSI C */ + + /* Calls to 'require' serialized by Lanes; this is safe. + */ + if (!been_here) { + been_here= 1; + ... one time initializations ... + } + + ... binding to Lua ... + } ++ |
+The mechanism Lanes uses for sharing Linda handles between separate Lua states +can be used for custom userdata as well. Here's what to do. +
+Deep userdata management will take care of tying to __gc methods, +and doing reference counting to see how many proxies are still there for +accessing the data. Once there are none, the data will be freed through a call +to the idfunc you provided. +
+NOTE: The lifespan of deep userdata may exceed that of the Lua state +that created it. The allocation of the data storage should not be tied to +the Lua state used. In other words, use malloc/free or +similar memory handling mechanism. +
+ + ++Lane handles are not implemented as deep userdata, and cannot thus be +copied across lanes. This is intentional; problems would occur at least when +multiple lanes were to wait upon one to get ready. Also, it is a matter of +design simplicity. +
+The same benefits can be achieved by having a single worker lane spawn all +the sublanes, and keep track of them. Communications to and from this lane +can be handled via a Linda. +
+ + ++In multithreaded scenarios, giving multiple parameters to print() +or file:write() may cause them to be overlapped in the output, +something like this: + +
+ A: print( 1, 2, 3, 4 ) + B: print( 'a', 'b', 'c', 'd' ) + + 1 a b 2 3 c d 4 ++ +Lanes does not protect you from this behaviour. The thing to do is either to +concentrate your output to a certain lane per stream, or to concatenate output +into a single string before you call the output function. + + + +
+Lanes is about making multithreading easy, and natural in the Lua state of mind. +Expect performance not to be an issue, if your program is logically built. +Here are some things one should consider, if best performance is vital: +
+
+Cancellation of lanes uses the Lua error mechanism with a special lightuserdata +error sentinel. +If you use pcall in code that needs to be cancellable +from the outside, the special error might not get through to Lanes, thus +preventing the Lane from being cleanly cancelled. You should throw any +lightuserdata error further. +
+This system can actually be used by application to detect cancel, do your own +cancellation duties, and pass on the error so Lanes will get it. If it does +not get a clean cancellation from a lane in due time, +it may forcefully kill the lane. +
+The sentinel is exposed as lanes.cancel_error, if you wish to use +its actual value. +
+ + + + ++Jan-2009 (2.0.3): +
For feedback, questions and suggestions: +