diff options
author | Mike Pall <mike> | 2011-02-09 01:26:02 +0100 |
---|---|---|
committer | Mike Pall <mike> | 2011-02-09 01:26:02 +0100 |
commit | 24c314e8fcfb3d12ea05c1f9bf7add40d24ae0cd (patch) | |
tree | bf7bd5d2b852f9c13b70f6392c24b315364cb968 /doc | |
parent | 2388a7fcc017b9e9a75a4674aa81933b510882f7 (diff) | |
download | luajit-24c314e8fcfb3d12ea05c1f9bf7add40d24ae0cd.tar.gz luajit-24c314e8fcfb3d12ea05c1f9bf7add40d24ae0cd.tar.bz2 luajit-24c314e8fcfb3d12ea05c1f9bf7add40d24ae0cd.zip |
FFI: Add more docs on FFI semantics.
Diffstat (limited to 'doc')
-rw-r--r-- | doc/ext_ffi_semantics.html | 292 |
1 files changed, 268 insertions, 24 deletions
diff --git a/doc/ext_ffi_semantics.html b/doc/ext_ffi_semantics.html index 9b7cac70..f48c6406 100644 --- a/doc/ext_ffi_semantics.html +++ b/doc/ext_ffi_semantics.html | |||
@@ -57,18 +57,159 @@ | |||
57 | </div> | 57 | </div> |
58 | <div id="main"> | 58 | <div id="main"> |
59 | <p> | 59 | <p> |
60 | TODO | 60 | This page describes the detailed semantics underlying the FFI library |
61 | and its interaction with both Lua and C code. | ||
62 | </p> | ||
63 | <p> | ||
64 | Given that the FFI library is designed to interface with C code | ||
65 | and that declarations can be written in plain C syntax, it | ||
66 | closely follows the C language semantics wherever possible. Some | ||
67 | concessions are needed for smoother interoperation with Lua language | ||
68 | semantics. But it should be straightforward to write applications | ||
69 | using the LuaJIT FFI for developers with a C or C++ background. | ||
61 | </p> | 70 | </p> |
62 | 71 | ||
63 | <h2 id="clang">C Language Support</h2> | 72 | <h2 id="clang">C Language Support</h2> |
64 | <p> | 73 | <p> |
65 | TODO | 74 | The FFI library has a built-in C parser with a minimal memory |
75 | footprint. It's used by the <a href="ext_ffi_api.html">ffi.* library | ||
76 | functions</a> to declare C types or external symbols. | ||
77 | </p> | ||
78 | <p> | ||
79 | It's only purpose is to parse C declarations, as found e.g. in | ||
80 | C header files. Although it does evaluate constant expressions, | ||
81 | it's <em>not</em> a C compiler. The body of <tt>inline</tt> | ||
82 | C function definitions is simply ignored. | ||
83 | </p> | ||
84 | <p> | ||
85 | Also, this is <em>not</em> a validating C parser. It expects and | ||
86 | accepts correctly formed C declarations, but it may choose to | ||
87 | ignore bad declarations or show rather generic error messages. If in | ||
88 | doubt, please check the input against your favorite C compiler. | ||
89 | </p> | ||
90 | <p> | ||
91 | The C parser complies to the <b>C99 language standard</b> plus | ||
92 | the following extensions: | ||
93 | </p> | ||
94 | <ul> | ||
95 | |||
96 | <li>C++-style comments (<tt>//</tt>).</li> | ||
97 | |||
98 | <li>The <tt>'\e'</tt> escape in character and string literals.</li> | ||
99 | |||
100 | <li>The <tt>long long</tt> 64 bit integer type.</tt> | ||
101 | |||
102 | <li>The C99/C++ boolean type, declared with the keywords <tt>bool</tt> | ||
103 | or <tt>_Bool</tt>.</li> | ||
104 | |||
105 | <li>Complex numbers, declared with the keywords <tt>complex</tt> or | ||
106 | <tt>_Complex</tt>.</li> | ||
107 | |||
108 | <li>Two complex number types: <tt>complex</tt> (aka | ||
109 | <tt>complex double</tt>) and <tt>complex float</tt>.</li> | ||
110 | |||
111 | <li>Vector types, declared with the GCC <tt>mode</tt> or | ||
112 | <tt>vector_size</tt> attribute.</li> | ||
113 | |||
114 | <li>Unnamed ('transparent') <tt>struct</tt>/<tt>union</tt> fields | ||
115 | inside a <tt>struct</tt>/<tt>union</tt>.</li> | ||
116 | |||
117 | <li>Incomplete <tt>enum</tt> declarations, handled like incomplete | ||
118 | <tt>struct</tt> declarations.</li> | ||
119 | |||
120 | <li>Unnamed <tt>enum</tt> fields inside a | ||
121 | <tt>struct</tt>/<tt>union</tt>. This is similar to a scoped C++ | ||
122 | <tt>enum</tt>, except that declared constants are visible in the | ||
123 | global namespace, too.</li> | ||
124 | |||
125 | <li>C++-style scoped <tt>static const</tt> declarations inside a | ||
126 | <tt>struct</tt>/<tt>union</tt>.</li> | ||
127 | |||
128 | <li>Zero-length arrays (<tt>[0]</tt>), empty | ||
129 | <tt>struct</tt>/<tt>union</tt>, variable-length arrays (VLA, | ||
130 | <tt>[?]</tt>) and variable-length structs (VLS, with a trailing | ||
131 | VLA).</li> | ||
132 | |||
133 | <li>Alternate GCC keywords with '<tt>__</tt>', e.g. | ||
134 | <tt>__const__</tt>.</li> | ||
135 | |||
136 | <li>GCC <tt>__attribute__</tt> with the following attributes: | ||
137 | <tt>aligned</tt>, <tt>packed</tt>, <tt>mode</tt>, | ||
138 | <tt>vector_size</tt>, <tt>cdecl</tt>, <tt>fastcall</tt>, | ||
139 | <tt>stdcall</tt>.</li> | ||
140 | |||
141 | <li>The GCC <tt>__extension__</tt> keyword and the GCC | ||
142 | <tt>__alignof__</tt> operator.</li> | ||
143 | |||
144 | <li>GCC <tt>__asm__("symname")</tt> symbol name redirection for | ||
145 | function declarations.</tt> | ||
146 | |||
147 | <li>MSVC keywords for fixed-length types: <tt>__int8</tt>, | ||
148 | <tt>__int16</tt>, <tt>__int32</tt> and <tt>__int64</tt>.</li> | ||
149 | |||
150 | <li>MSVC <tt>__cdecl</tt>, <tt>__fastcall</tt>, <tt>__stdcall</tt>, | ||
151 | <tt>__ptr32</tt>, <tt>__ptr64</tt>, <tt>__declspec(align(n))</tt> | ||
152 | and <tt>#pragma pack</tt>.</li> | ||
153 | |||
154 | <li>All other GCC/MSVC-specific attributes are ignored.</li> | ||
155 | |||
156 | </ul> | ||
157 | <p> | ||
158 | The following C types are pre-defined by the C parser (like | ||
159 | a <tt>typedef</tt>, except re-declarations will be ignored): | ||
66 | </p> | 160 | </p> |
161 | <ul> | ||
162 | |||
163 | <li>Vararg handling: <tt>va_list</tt>, <tt>__builtin_va_list</tt>, | ||
164 | <tt>__gnuc_va_list</tt>.</li> | ||
165 | |||
166 | <li>From <tt><stddef.h></tt>: <tt>ptrdiff_t</tt>, | ||
167 | <tt>size_t</tt>, <tt>wchar_t</tt>.</li> | ||
168 | |||
169 | <li>From <tt><stdint.h></tt>: <tt>int8_t</tt>, <tt>int16_t</tt>, | ||
170 | <tt>int32_t</tt>, <tt>int64_t</tt>, <tt>uint8_t</tt>, | ||
171 | <tt>uint16_t</tt>, <tt>uint32_t</tt>, <tt>uint64_t</tt>, | ||
172 | <tt>intptr_t</tt>, <tt>uintptr_t</tt>.</li> | ||
173 | |||
174 | </ul> | ||
175 | <p> | ||
176 | You're encouraged to use these types in preference to the | ||
177 | compiler-specific extensions or the target-dependent standard types. | ||
178 | E.g. <tt>char</tt> differs in signedness and <tt>long</tt> differs in | ||
179 | size, depending on the target architecture and platform ABI. | ||
180 | </p> | ||
181 | <p> | ||
182 | The following C features are <b>not</b> supported: | ||
183 | </p> | ||
184 | <ul> | ||
185 | |||
186 | <li>A declaration must always have a type specifier; it doesn't | ||
187 | default to an <tt>int</tt> type.</li> | ||
188 | |||
189 | <li>Old-style empty function declarations (K&R) are not allowed. | ||
190 | All C functions must have a proper protype declaration. A | ||
191 | function declared without parameters (<tt>int foo();</tt>) is | ||
192 | treated as a function taking zero arguments, like in C++.</li> | ||
193 | |||
194 | <li>The <tt>long double</tt> C type is parsed correctly, but | ||
195 | there's no support for the related conversions, accesses or arithmetic | ||
196 | operations.</li> | ||
197 | |||
198 | <li>Wide character strings and character literals are not | ||
199 | supported.</li> | ||
200 | |||
201 | <li><a href="#status">See below</a> for features that are currently | ||
202 | not implemented.</li> | ||
203 | |||
204 | </ul> | ||
67 | 205 | ||
68 | <h2 id="convert">C Type Conversion Rules</h2> | 206 | <h2 id="convert">C Type Conversion Rules</h2> |
69 | <p> | 207 | <p> |
70 | TODO | 208 | TODO |
71 | </p> | 209 | </p> |
210 | <h3 id="convert_tolua">Conversions from C types to Lua objects</h2> | ||
211 | <h3 id="convert_fromlua">Conversions from Lua objects to C types</h2> | ||
212 | <h3 id="convert_between">Conversions between C types</h2> | ||
72 | 213 | ||
73 | <h2 id="init">Initializers</h2> | 214 | <h2 id="init">Initializers</h2> |
74 | <p> | 215 | <p> |
@@ -81,8 +222,8 @@ initializers and the C types involved: | |||
81 | <li>If no initializers are given, the object is filled with zero bytes.</li> | 222 | <li>If no initializers are given, the object is filled with zero bytes.</li> |
82 | 223 | ||
83 | <li>Scalar types (numbers and pointers) accept a single initializer. | 224 | <li>Scalar types (numbers and pointers) accept a single initializer. |
84 | The standard <a href="#convert">C type conversion rules</a> | 225 | The Lua object is <a href="#convert_fromlua">converted to the scalar |
85 | apply.</li> | 226 | C type</a>.</li> |
86 | 227 | ||
87 | <li>Valarrays (complex numbers and vectors) are treated like scalars | 228 | <li>Valarrays (complex numbers and vectors) are treated like scalars |
88 | when a single initializer is given. Otherwise they are treated like | 229 | when a single initializer is given. Otherwise they are treated like |
@@ -111,16 +252,6 @@ initializer or a compatible aggregate, of course.</li> | |||
111 | 252 | ||
112 | </ul> | 253 | </ul> |
113 | 254 | ||
114 | <h2 id="clib">C Library Namespaces</h2> | ||
115 | <p> | ||
116 | A C library namespace is a special kind of object which allows | ||
117 | access to the symbols contained in libraries. Indexing it with a | ||
118 | symbol name (a Lua string) automatically binds it to the library. | ||
119 | </p> | ||
120 | <p> | ||
121 | TODO | ||
122 | </p> | ||
123 | |||
124 | <h2 id="ops">Operations on cdata Objects</h2> | 255 | <h2 id="ops">Operations on cdata Objects</h2> |
125 | <p> | 256 | <p> |
126 | TODO | 257 | TODO |
@@ -158,9 +289,9 @@ Similar rules apply for Lua strings which are implicitly converted to | |||
158 | <tt>"const char *"</tt>: the string object itself must be | 289 | <tt>"const char *"</tt>: the string object itself must be |
159 | referenced somewhere or it'll be garbage collected eventually. The | 290 | referenced somewhere or it'll be garbage collected eventually. The |
160 | pointer will then point to stale data, which may have already beeen | 291 | pointer will then point to stale data, which may have already beeen |
161 | overwritten. Note that string literals are automatically kept alive as | 292 | overwritten. Note that <em>string literals</em> are automatically kept |
162 | long as the function containing it (actually its prototype) is not | 293 | alive as long as the function containing it (actually its prototype) |
163 | garbage collected. | 294 | is not garbage collected. |
164 | </p> | 295 | </p> |
165 | <p> | 296 | <p> |
166 | Objects which are passed as an argument to an external C function | 297 | Objects which are passed as an argument to an external C function |
@@ -181,6 +312,121 @@ indistinguishable from pointers returned by C functions (which is one | |||
181 | of the reasons why the GC cannot follow them). | 312 | of the reasons why the GC cannot follow them). |
182 | </p> | 313 | </p> |
183 | 314 | ||
315 | <h2 id="clib">C Library Namespaces</h2> | ||
316 | <p> | ||
317 | A C library namespace is a special kind of object which allows | ||
318 | access to the symbols contained in shared libraries or the default | ||
319 | symbol namespace. The default | ||
320 | <a href="ext_ffi_api.html#ffi_C"><tt>ffi.C</tt></a> namespace is | ||
321 | automatically created when the FFI library is loaded. C library | ||
322 | namespaces for specific shared libraries may be created with the | ||
323 | <a href="ext_ffi_api.html#ffi_load"><tt>ffi.load()</tt></a> API | ||
324 | function. | ||
325 | </p> | ||
326 | <p> | ||
327 | Indexing a C library namespace object with a symbol name (a Lua | ||
328 | string) automatically binds it to the library. First the symbol type | ||
329 | is resolved — it must have been declared with | ||
330 | <a href="ext_ffi_api.html#ffi_cdef"><tt>ffi.cdef</tt></a>. Then the | ||
331 | symbol address is resolved by searching for the symbol name in the | ||
332 | associated shared libraries or the default symbol namespace. Finally, | ||
333 | the resulting binding between the symbol name, the symbol type and its | ||
334 | address is cached. Missing symbol declarations or nonexistent symbol | ||
335 | names cause an error. | ||
336 | </p> | ||
337 | <p> | ||
338 | This is what happens on a <b>read access</b> for the different kinds of | ||
339 | symbols: | ||
340 | </p> | ||
341 | <ul> | ||
342 | |||
343 | <li>External functions: a cdata object with the type of the function | ||
344 | and its address is returned.</li> | ||
345 | |||
346 | <li>External variables: the symbol address is dereferenced and the | ||
347 | loaded value is <a href="#convert_tolua">converted to a Lua object</a> | ||
348 | and returned.</li> | ||
349 | |||
350 | <li>Constant values (<tt>static const</tt> or <tt>enum</tt> | ||
351 | constants): the constant is <a href="#convert_tolua">converted to a | ||
352 | Lua object</a> and returned.</li> | ||
353 | |||
354 | </ul> | ||
355 | <p> | ||
356 | This is what happens on a <b>write access</b>: | ||
357 | </p> | ||
358 | <ul> | ||
359 | |||
360 | <li>External variables: the value to be written is | ||
361 | <a href="#convert_fromlua">converted to the C type</a> of the | ||
362 | variable and then stored at the symbol address.</li> | ||
363 | |||
364 | <li>Writing to constant variables or to any other symbol type causes | ||
365 | an error, like any other attempted write to a constant location.</li> | ||
366 | |||
367 | </ul> | ||
368 | <p> | ||
369 | C library namespaces themselves are garbage collected objects. If | ||
370 | the last reference to the namespace object is gone, the garbage | ||
371 | collector will eventually release the shared library reference and | ||
372 | remove all memory associated with the namespace. Since this may | ||
373 | trigger the removal of the shared library from the memory of the | ||
374 | running process, it's generally <em>not safe</em> to use function | ||
375 | cdata objects obtained from a library if the namespace object may be | ||
376 | unreferenced. | ||
377 | </p> | ||
378 | <p> | ||
379 | Performance notice: the JIT compiler specializes to the identity of | ||
380 | namespace objects and to the strings used to index it. This | ||
381 | effectively turns function cdata objects into constants. It's not | ||
382 | useful and actually counter-productive to explicitly cache these | ||
383 | function objects, e.g. <tt>local strlen = ffi.C.strlen</tt>. OTOH it | ||
384 | <em>is</em> useful to cache the namespace itself, e.g. <tt>local C = | ||
385 | ffi.C</tt>. | ||
386 | </p> | ||
387 | |||
388 | <h2 id="policy">No Hand-holding!</h2> | ||
389 | <p> | ||
390 | The FFI library has been designed as <b>a low-level library</b>. The | ||
391 | goal is to interface with C code and C data types with a | ||
392 | minimum of overhead. This means <b>you can do anything you can do | ||
393 | from C</b>: access all memory, overwrite anything in memory, call | ||
394 | machine code at any memory address and so on. | ||
395 | </p> | ||
396 | <p> | ||
397 | The FFI library provides <b>no memory safety</b>, unlike regular Lua | ||
398 | code. It will happily allow you to dereference a <tt>NULL</tt> | ||
399 | pointer, to access arrays out of bounds or to misdeclare | ||
400 | C functions. If you make a mistake, your application might crash, | ||
401 | just like equivalent C code would. | ||
402 | </p> | ||
403 | <p> | ||
404 | This behavior is inevitable, since the goal is to provide full | ||
405 | interoperability with C code. Adding extra safety measures, like | ||
406 | bounds checks, would be futile. There's no way to detect | ||
407 | misdeclarations of C functions, since shared libraries only | ||
408 | provide symbol names, but no type information. Likewise there's no way | ||
409 | to infer the valid range of indexes for a returned pointer. | ||
410 | </p> | ||
411 | <p> | ||
412 | Again: the FFI library is a low-level library. This implies it needs | ||
413 | to be used with care, but it's flexibility and performance often | ||
414 | outweigh this concern. If you're a C or C++ developer, it'll be easy | ||
415 | to apply your existing knowledge. OTOH writing code for the FFI | ||
416 | library is not for the faint of heart and probably shouldn't be the | ||
417 | first exercise for someone with little experience in Lua, C or C++. | ||
418 | </p> | ||
419 | <p> | ||
420 | As a corollary of the above, the FFI library is <b>not safe for use by | ||
421 | untrusted Lua code</b>. If you're sandboxing untrusted Lua code, you | ||
422 | definitely don't want to give this code access to the FFI library or | ||
423 | to <em>any</em> cdata object (except 64 bit integers or complex | ||
424 | numbers). Any properly engineered Lua sandbox needs to provide safety | ||
425 | wrappers for many of the standard Lua library functions — | ||
426 | similar wrappers need to be written for high-level operations on FFI | ||
427 | data types, too. | ||
428 | </p> | ||
429 | |||
184 | <h2 id="status">Current Status</h2> | 430 | <h2 id="status">Current Status</h2> |
185 | <p> | 431 | <p> |
186 | The initial release of the FFI library has some limitations and is | 432 | The initial release of the FFI library has some limitations and is |
@@ -200,18 +446,15 @@ obscure constructs.</li> | |||
200 | <li><tt>static const</tt> declarations only work for integer types | 446 | <li><tt>static const</tt> declarations only work for integer types |
201 | up to 32 bits. Neither declaring string constants nor | 447 | up to 32 bits. Neither declaring string constants nor |
202 | floating-point constants is supported.</li> | 448 | floating-point constants is supported.</li> |
203 | <li>The <tt>long double</tt> C type is parsed correctly, but | ||
204 | there's no support for the related conversions, accesses or | ||
205 | arithmetic operations.</li> | ||
206 | <li>Packed <tt>struct</tt> bitfields that cross container boundaries | 449 | <li>Packed <tt>struct</tt> bitfields that cross container boundaries |
207 | are not implemented.</li> | 450 | are not implemented.</li> |
208 | <li>Native vector types may be defined with the GCC <tt>mode</tt> and | 451 | <li>Native vector types may be defined with the GCC <tt>mode</tt> or |
209 | <tt>vector_size</tt> attributes. But no operations other than loading, | 452 | <tt>vector_size</tt> attribute. But no operations other than loading, |
210 | storing and initializing them are supported, yet.</li> | 453 | storing and initializing them are supported, yet.</li> |
211 | <li>The <tt>volatile</tt> type qualifier is currently ignored by | 454 | <li>The <tt>volatile</tt> type qualifier is currently ignored by |
212 | compiled code.</li> | 455 | compiled code.</li> |
213 | <li><a href="ext_ffi_api.html#ffi_cdef">ffi.cdef</a> silently ignores | 456 | <li><a href="ext_ffi_api.html#ffi_cdef"><tt>ffi.cdef</tt></a> silently |
214 | all redeclarations.</li> | 457 | ignores all redeclarations.</li> |
215 | </ul> | 458 | </ul> |
216 | <p> | 459 | <p> |
217 | The JIT compiler already handles a large subset of all FFI operations. | 460 | The JIT compiler already handles a large subset of all FFI operations. |
@@ -238,6 +481,7 @@ two.</li> | |||
238 | value.</li> | 481 | value.</li> |
239 | <li>Calls to C functions with 64 bit arguments or return values | 482 | <li>Calls to C functions with 64 bit arguments or return values |
240 | on 32 bit CPUs.</li> | 483 | on 32 bit CPUs.</li> |
484 | <li>Accesses to external variables in C library namespaces.</li> | ||
241 | <li><tt>tostring()</tt> for cdata types.</li> | 485 | <li><tt>tostring()</tt> for cdata types.</li> |
242 | <li>The following <a href="ext_ffi_api.html">ffi.* API</a> functions: | 486 | <li>The following <a href="ext_ffi_api.html">ffi.* API</a> functions: |
243 | <tt>ffi.sizeof()</tt>, <tt>ffi.alignof()</tt>, <tt>ffi.offsetof()</tt>. | 487 | <tt>ffi.sizeof()</tt>, <tt>ffi.alignof()</tt>, <tt>ffi.offsetof()</tt>. |