diff options
| -rw-r--r-- | doc/ext_ffi_semantics.html | 292 |
1 files changed, 268 insertions, 24 deletions
diff --git a/doc/ext_ffi_semantics.html b/doc/ext_ffi_semantics.html index 9b7cac70..f48c6406 100644 --- a/doc/ext_ffi_semantics.html +++ b/doc/ext_ffi_semantics.html | |||
| @@ -57,18 +57,159 @@ | |||
| 57 | </div> | 57 | </div> |
| 58 | <div id="main"> | 58 | <div id="main"> |
| 59 | <p> | 59 | <p> |
| 60 | TODO | 60 | This page describes the detailed semantics underlying the FFI library |
| 61 | and its interaction with both Lua and C code. | ||
| 62 | </p> | ||
| 63 | <p> | ||
| 64 | Given that the FFI library is designed to interface with C code | ||
| 65 | and that declarations can be written in plain C syntax, it | ||
| 66 | closely follows the C language semantics wherever possible. Some | ||
| 67 | concessions are needed for smoother interoperation with Lua language | ||
| 68 | semantics. But it should be straightforward to write applications | ||
| 69 | using the LuaJIT FFI for developers with a C or C++ background. | ||
| 61 | </p> | 70 | </p> |
| 62 | 71 | ||
| 63 | <h2 id="clang">C Language Support</h2> | 72 | <h2 id="clang">C Language Support</h2> |
| 64 | <p> | 73 | <p> |
| 65 | TODO | 74 | The FFI library has a built-in C parser with a minimal memory |
| 75 | footprint. It's used by the <a href="ext_ffi_api.html">ffi.* library | ||
| 76 | functions</a> to declare C types or external symbols. | ||
| 77 | </p> | ||
| 78 | <p> | ||
| 79 | It's only purpose is to parse C declarations, as found e.g. in | ||
| 80 | C header files. Although it does evaluate constant expressions, | ||
| 81 | it's <em>not</em> a C compiler. The body of <tt>inline</tt> | ||
| 82 | C function definitions is simply ignored. | ||
| 83 | </p> | ||
| 84 | <p> | ||
| 85 | Also, this is <em>not</em> a validating C parser. It expects and | ||
| 86 | accepts correctly formed C declarations, but it may choose to | ||
| 87 | ignore bad declarations or show rather generic error messages. If in | ||
| 88 | doubt, please check the input against your favorite C compiler. | ||
| 89 | </p> | ||
| 90 | <p> | ||
| 91 | The C parser complies to the <b>C99 language standard</b> plus | ||
| 92 | the following extensions: | ||
| 93 | </p> | ||
| 94 | <ul> | ||
| 95 | |||
| 96 | <li>C++-style comments (<tt>//</tt>).</li> | ||
| 97 | |||
| 98 | <li>The <tt>'\e'</tt> escape in character and string literals.</li> | ||
| 99 | |||
| 100 | <li>The <tt>long long</tt> 64 bit integer type.</tt> | ||
| 101 | |||
| 102 | <li>The C99/C++ boolean type, declared with the keywords <tt>bool</tt> | ||
| 103 | or <tt>_Bool</tt>.</li> | ||
| 104 | |||
| 105 | <li>Complex numbers, declared with the keywords <tt>complex</tt> or | ||
| 106 | <tt>_Complex</tt>.</li> | ||
| 107 | |||
| 108 | <li>Two complex number types: <tt>complex</tt> (aka | ||
| 109 | <tt>complex double</tt>) and <tt>complex float</tt>.</li> | ||
| 110 | |||
| 111 | <li>Vector types, declared with the GCC <tt>mode</tt> or | ||
| 112 | <tt>vector_size</tt> attribute.</li> | ||
| 113 | |||
| 114 | <li>Unnamed ('transparent') <tt>struct</tt>/<tt>union</tt> fields | ||
| 115 | inside a <tt>struct</tt>/<tt>union</tt>.</li> | ||
| 116 | |||
| 117 | <li>Incomplete <tt>enum</tt> declarations, handled like incomplete | ||
| 118 | <tt>struct</tt> declarations.</li> | ||
| 119 | |||
| 120 | <li>Unnamed <tt>enum</tt> fields inside a | ||
| 121 | <tt>struct</tt>/<tt>union</tt>. This is similar to a scoped C++ | ||
| 122 | <tt>enum</tt>, except that declared constants are visible in the | ||
| 123 | global namespace, too.</li> | ||
| 124 | |||
| 125 | <li>C++-style scoped <tt>static const</tt> declarations inside a | ||
| 126 | <tt>struct</tt>/<tt>union</tt>.</li> | ||
| 127 | |||
| 128 | <li>Zero-length arrays (<tt>[0]</tt>), empty | ||
| 129 | <tt>struct</tt>/<tt>union</tt>, variable-length arrays (VLA, | ||
| 130 | <tt>[?]</tt>) and variable-length structs (VLS, with a trailing | ||
| 131 | VLA).</li> | ||
| 132 | |||
| 133 | <li>Alternate GCC keywords with '<tt>__</tt>', e.g. | ||
| 134 | <tt>__const__</tt>.</li> | ||
| 135 | |||
| 136 | <li>GCC <tt>__attribute__</tt> with the following attributes: | ||
| 137 | <tt>aligned</tt>, <tt>packed</tt>, <tt>mode</tt>, | ||
| 138 | <tt>vector_size</tt>, <tt>cdecl</tt>, <tt>fastcall</tt>, | ||
| 139 | <tt>stdcall</tt>.</li> | ||
| 140 | |||
| 141 | <li>The GCC <tt>__extension__</tt> keyword and the GCC | ||
| 142 | <tt>__alignof__</tt> operator.</li> | ||
| 143 | |||
| 144 | <li>GCC <tt>__asm__("symname")</tt> symbol name redirection for | ||
| 145 | function declarations.</tt> | ||
| 146 | |||
| 147 | <li>MSVC keywords for fixed-length types: <tt>__int8</tt>, | ||
| 148 | <tt>__int16</tt>, <tt>__int32</tt> and <tt>__int64</tt>.</li> | ||
| 149 | |||
| 150 | <li>MSVC <tt>__cdecl</tt>, <tt>__fastcall</tt>, <tt>__stdcall</tt>, | ||
| 151 | <tt>__ptr32</tt>, <tt>__ptr64</tt>, <tt>__declspec(align(n))</tt> | ||
| 152 | and <tt>#pragma pack</tt>.</li> | ||
| 153 | |||
| 154 | <li>All other GCC/MSVC-specific attributes are ignored.</li> | ||
| 155 | |||
| 156 | </ul> | ||
| 157 | <p> | ||
| 158 | The following C types are pre-defined by the C parser (like | ||
| 159 | a <tt>typedef</tt>, except re-declarations will be ignored): | ||
| 66 | </p> | 160 | </p> |
| 161 | <ul> | ||
| 162 | |||
| 163 | <li>Vararg handling: <tt>va_list</tt>, <tt>__builtin_va_list</tt>, | ||
| 164 | <tt>__gnuc_va_list</tt>.</li> | ||
| 165 | |||
| 166 | <li>From <tt><stddef.h></tt>: <tt>ptrdiff_t</tt>, | ||
| 167 | <tt>size_t</tt>, <tt>wchar_t</tt>.</li> | ||
| 168 | |||
| 169 | <li>From <tt><stdint.h></tt>: <tt>int8_t</tt>, <tt>int16_t</tt>, | ||
| 170 | <tt>int32_t</tt>, <tt>int64_t</tt>, <tt>uint8_t</tt>, | ||
| 171 | <tt>uint16_t</tt>, <tt>uint32_t</tt>, <tt>uint64_t</tt>, | ||
| 172 | <tt>intptr_t</tt>, <tt>uintptr_t</tt>.</li> | ||
| 173 | |||
| 174 | </ul> | ||
| 175 | <p> | ||
| 176 | You're encouraged to use these types in preference to the | ||
| 177 | compiler-specific extensions or the target-dependent standard types. | ||
| 178 | E.g. <tt>char</tt> differs in signedness and <tt>long</tt> differs in | ||
| 179 | size, depending on the target architecture and platform ABI. | ||
| 180 | </p> | ||
| 181 | <p> | ||
| 182 | The following C features are <b>not</b> supported: | ||
| 183 | </p> | ||
| 184 | <ul> | ||
| 185 | |||
| 186 | <li>A declaration must always have a type specifier; it doesn't | ||
| 187 | default to an <tt>int</tt> type.</li> | ||
| 188 | |||
| 189 | <li>Old-style empty function declarations (K&R) are not allowed. | ||
| 190 | All C functions must have a proper protype declaration. A | ||
| 191 | function declared without parameters (<tt>int foo();</tt>) is | ||
| 192 | treated as a function taking zero arguments, like in C++.</li> | ||
| 193 | |||
| 194 | <li>The <tt>long double</tt> C type is parsed correctly, but | ||
| 195 | there's no support for the related conversions, accesses or arithmetic | ||
| 196 | operations.</li> | ||
| 197 | |||
| 198 | <li>Wide character strings and character literals are not | ||
| 199 | supported.</li> | ||
| 200 | |||
| 201 | <li><a href="#status">See below</a> for features that are currently | ||
| 202 | not implemented.</li> | ||
| 203 | |||
| 204 | </ul> | ||
| 67 | 205 | ||
| 68 | <h2 id="convert">C Type Conversion Rules</h2> | 206 | <h2 id="convert">C Type Conversion Rules</h2> |
| 69 | <p> | 207 | <p> |
| 70 | TODO | 208 | TODO |
| 71 | </p> | 209 | </p> |
| 210 | <h3 id="convert_tolua">Conversions from C types to Lua objects</h2> | ||
| 211 | <h3 id="convert_fromlua">Conversions from Lua objects to C types</h2> | ||
| 212 | <h3 id="convert_between">Conversions between C types</h2> | ||
| 72 | 213 | ||
| 73 | <h2 id="init">Initializers</h2> | 214 | <h2 id="init">Initializers</h2> |
| 74 | <p> | 215 | <p> |
| @@ -81,8 +222,8 @@ initializers and the C types involved: | |||
| 81 | <li>If no initializers are given, the object is filled with zero bytes.</li> | 222 | <li>If no initializers are given, the object is filled with zero bytes.</li> |
| 82 | 223 | ||
| 83 | <li>Scalar types (numbers and pointers) accept a single initializer. | 224 | <li>Scalar types (numbers and pointers) accept a single initializer. |
| 84 | The standard <a href="#convert">C type conversion rules</a> | 225 | The Lua object is <a href="#convert_fromlua">converted to the scalar |
| 85 | apply.</li> | 226 | C type</a>.</li> |
| 86 | 227 | ||
| 87 | <li>Valarrays (complex numbers and vectors) are treated like scalars | 228 | <li>Valarrays (complex numbers and vectors) are treated like scalars |
| 88 | when a single initializer is given. Otherwise they are treated like | 229 | when a single initializer is given. Otherwise they are treated like |
| @@ -111,16 +252,6 @@ initializer or a compatible aggregate, of course.</li> | |||
| 111 | 252 | ||
| 112 | </ul> | 253 | </ul> |
| 113 | 254 | ||
| 114 | <h2 id="clib">C Library Namespaces</h2> | ||
| 115 | <p> | ||
| 116 | A C library namespace is a special kind of object which allows | ||
| 117 | access to the symbols contained in libraries. Indexing it with a | ||
| 118 | symbol name (a Lua string) automatically binds it to the library. | ||
| 119 | </p> | ||
| 120 | <p> | ||
| 121 | TODO | ||
| 122 | </p> | ||
| 123 | |||
| 124 | <h2 id="ops">Operations on cdata Objects</h2> | 255 | <h2 id="ops">Operations on cdata Objects</h2> |
| 125 | <p> | 256 | <p> |
| 126 | TODO | 257 | TODO |
| @@ -158,9 +289,9 @@ Similar rules apply for Lua strings which are implicitly converted to | |||
| 158 | <tt>"const char *"</tt>: the string object itself must be | 289 | <tt>"const char *"</tt>: the string object itself must be |
| 159 | referenced somewhere or it'll be garbage collected eventually. The | 290 | referenced somewhere or it'll be garbage collected eventually. The |
| 160 | pointer will then point to stale data, which may have already beeen | 291 | pointer will then point to stale data, which may have already beeen |
| 161 | overwritten. Note that string literals are automatically kept alive as | 292 | overwritten. Note that <em>string literals</em> are automatically kept |
| 162 | long as the function containing it (actually its prototype) is not | 293 | alive as long as the function containing it (actually its prototype) |
| 163 | garbage collected. | 294 | is not garbage collected. |
| 164 | </p> | 295 | </p> |
| 165 | <p> | 296 | <p> |
| 166 | Objects which are passed as an argument to an external C function | 297 | Objects which are passed as an argument to an external C function |
| @@ -181,6 +312,121 @@ indistinguishable from pointers returned by C functions (which is one | |||
| 181 | of the reasons why the GC cannot follow them). | 312 | of the reasons why the GC cannot follow them). |
| 182 | </p> | 313 | </p> |
| 183 | 314 | ||
| 315 | <h2 id="clib">C Library Namespaces</h2> | ||
| 316 | <p> | ||
| 317 | A C library namespace is a special kind of object which allows | ||
| 318 | access to the symbols contained in shared libraries or the default | ||
| 319 | symbol namespace. The default | ||
| 320 | <a href="ext_ffi_api.html#ffi_C"><tt>ffi.C</tt></a> namespace is | ||
| 321 | automatically created when the FFI library is loaded. C library | ||
| 322 | namespaces for specific shared libraries may be created with the | ||
| 323 | <a href="ext_ffi_api.html#ffi_load"><tt>ffi.load()</tt></a> API | ||
| 324 | function. | ||
| 325 | </p> | ||
| 326 | <p> | ||
| 327 | Indexing a C library namespace object with a symbol name (a Lua | ||
| 328 | string) automatically binds it to the library. First the symbol type | ||
| 329 | is resolved — it must have been declared with | ||
| 330 | <a href="ext_ffi_api.html#ffi_cdef"><tt>ffi.cdef</tt></a>. Then the | ||
| 331 | symbol address is resolved by searching for the symbol name in the | ||
| 332 | associated shared libraries or the default symbol namespace. Finally, | ||
| 333 | the resulting binding between the symbol name, the symbol type and its | ||
| 334 | address is cached. Missing symbol declarations or nonexistent symbol | ||
| 335 | names cause an error. | ||
| 336 | </p> | ||
| 337 | <p> | ||
| 338 | This is what happens on a <b>read access</b> for the different kinds of | ||
| 339 | symbols: | ||
| 340 | </p> | ||
| 341 | <ul> | ||
| 342 | |||
| 343 | <li>External functions: a cdata object with the type of the function | ||
| 344 | and its address is returned.</li> | ||
| 345 | |||
| 346 | <li>External variables: the symbol address is dereferenced and the | ||
| 347 | loaded value is <a href="#convert_tolua">converted to a Lua object</a> | ||
| 348 | and returned.</li> | ||
| 349 | |||
| 350 | <li>Constant values (<tt>static const</tt> or <tt>enum</tt> | ||
| 351 | constants): the constant is <a href="#convert_tolua">converted to a | ||
| 352 | Lua object</a> and returned.</li> | ||
| 353 | |||
| 354 | </ul> | ||
| 355 | <p> | ||
| 356 | This is what happens on a <b>write access</b>: | ||
| 357 | </p> | ||
| 358 | <ul> | ||
| 359 | |||
| 360 | <li>External variables: the value to be written is | ||
| 361 | <a href="#convert_fromlua">converted to the C type</a> of the | ||
| 362 | variable and then stored at the symbol address.</li> | ||
| 363 | |||
| 364 | <li>Writing to constant variables or to any other symbol type causes | ||
| 365 | an error, like any other attempted write to a constant location.</li> | ||
| 366 | |||
| 367 | </ul> | ||
| 368 | <p> | ||
| 369 | C library namespaces themselves are garbage collected objects. If | ||
| 370 | the last reference to the namespace object is gone, the garbage | ||
| 371 | collector will eventually release the shared library reference and | ||
| 372 | remove all memory associated with the namespace. Since this may | ||
| 373 | trigger the removal of the shared library from the memory of the | ||
| 374 | running process, it's generally <em>not safe</em> to use function | ||
| 375 | cdata objects obtained from a library if the namespace object may be | ||
| 376 | unreferenced. | ||
| 377 | </p> | ||
| 378 | <p> | ||
| 379 | Performance notice: the JIT compiler specializes to the identity of | ||
| 380 | namespace objects and to the strings used to index it. This | ||
| 381 | effectively turns function cdata objects into constants. It's not | ||
| 382 | useful and actually counter-productive to explicitly cache these | ||
| 383 | function objects, e.g. <tt>local strlen = ffi.C.strlen</tt>. OTOH it | ||
| 384 | <em>is</em> useful to cache the namespace itself, e.g. <tt>local C = | ||
| 385 | ffi.C</tt>. | ||
| 386 | </p> | ||
| 387 | |||
| 388 | <h2 id="policy">No Hand-holding!</h2> | ||
| 389 | <p> | ||
| 390 | The FFI library has been designed as <b>a low-level library</b>. The | ||
| 391 | goal is to interface with C code and C data types with a | ||
| 392 | minimum of overhead. This means <b>you can do anything you can do | ||
| 393 | from C</b>: access all memory, overwrite anything in memory, call | ||
| 394 | machine code at any memory address and so on. | ||
| 395 | </p> | ||
| 396 | <p> | ||
| 397 | The FFI library provides <b>no memory safety</b>, unlike regular Lua | ||
| 398 | code. It will happily allow you to dereference a <tt>NULL</tt> | ||
| 399 | pointer, to access arrays out of bounds or to misdeclare | ||
| 400 | C functions. If you make a mistake, your application might crash, | ||
| 401 | just like equivalent C code would. | ||
| 402 | </p> | ||
| 403 | <p> | ||
| 404 | This behavior is inevitable, since the goal is to provide full | ||
| 405 | interoperability with C code. Adding extra safety measures, like | ||
| 406 | bounds checks, would be futile. There's no way to detect | ||
| 407 | misdeclarations of C functions, since shared libraries only | ||
| 408 | provide symbol names, but no type information. Likewise there's no way | ||
| 409 | to infer the valid range of indexes for a returned pointer. | ||
| 410 | </p> | ||
| 411 | <p> | ||
| 412 | Again: the FFI library is a low-level library. This implies it needs | ||
| 413 | to be used with care, but it's flexibility and performance often | ||
| 414 | outweigh this concern. If you're a C or C++ developer, it'll be easy | ||
| 415 | to apply your existing knowledge. OTOH writing code for the FFI | ||
| 416 | library is not for the faint of heart and probably shouldn't be the | ||
| 417 | first exercise for someone with little experience in Lua, C or C++. | ||
| 418 | </p> | ||
| 419 | <p> | ||
| 420 | As a corollary of the above, the FFI library is <b>not safe for use by | ||
| 421 | untrusted Lua code</b>. If you're sandboxing untrusted Lua code, you | ||
| 422 | definitely don't want to give this code access to the FFI library or | ||
| 423 | to <em>any</em> cdata object (except 64 bit integers or complex | ||
| 424 | numbers). Any properly engineered Lua sandbox needs to provide safety | ||
| 425 | wrappers for many of the standard Lua library functions — | ||
| 426 | similar wrappers need to be written for high-level operations on FFI | ||
| 427 | data types, too. | ||
| 428 | </p> | ||
| 429 | |||
| 184 | <h2 id="status">Current Status</h2> | 430 | <h2 id="status">Current Status</h2> |
| 185 | <p> | 431 | <p> |
| 186 | The initial release of the FFI library has some limitations and is | 432 | The initial release of the FFI library has some limitations and is |
| @@ -200,18 +446,15 @@ obscure constructs.</li> | |||
| 200 | <li><tt>static const</tt> declarations only work for integer types | 446 | <li><tt>static const</tt> declarations only work for integer types |
| 201 | up to 32 bits. Neither declaring string constants nor | 447 | up to 32 bits. Neither declaring string constants nor |
| 202 | floating-point constants is supported.</li> | 448 | floating-point constants is supported.</li> |
| 203 | <li>The <tt>long double</tt> C type is parsed correctly, but | ||
| 204 | there's no support for the related conversions, accesses or | ||
| 205 | arithmetic operations.</li> | ||
| 206 | <li>Packed <tt>struct</tt> bitfields that cross container boundaries | 449 | <li>Packed <tt>struct</tt> bitfields that cross container boundaries |
| 207 | are not implemented.</li> | 450 | are not implemented.</li> |
| 208 | <li>Native vector types may be defined with the GCC <tt>mode</tt> and | 451 | <li>Native vector types may be defined with the GCC <tt>mode</tt> or |
| 209 | <tt>vector_size</tt> attributes. But no operations other than loading, | 452 | <tt>vector_size</tt> attribute. But no operations other than loading, |
| 210 | storing and initializing them are supported, yet.</li> | 453 | storing and initializing them are supported, yet.</li> |
| 211 | <li>The <tt>volatile</tt> type qualifier is currently ignored by | 454 | <li>The <tt>volatile</tt> type qualifier is currently ignored by |
| 212 | compiled code.</li> | 455 | compiled code.</li> |
| 213 | <li><a href="ext_ffi_api.html#ffi_cdef">ffi.cdef</a> silently ignores | 456 | <li><a href="ext_ffi_api.html#ffi_cdef"><tt>ffi.cdef</tt></a> silently |
| 214 | all redeclarations.</li> | 457 | ignores all redeclarations.</li> |
| 215 | </ul> | 458 | </ul> |
| 216 | <p> | 459 | <p> |
| 217 | The JIT compiler already handles a large subset of all FFI operations. | 460 | The JIT compiler already handles a large subset of all FFI operations. |
| @@ -238,6 +481,7 @@ two.</li> | |||
| 238 | value.</li> | 481 | value.</li> |
| 239 | <li>Calls to C functions with 64 bit arguments or return values | 482 | <li>Calls to C functions with 64 bit arguments or return values |
| 240 | on 32 bit CPUs.</li> | 483 | on 32 bit CPUs.</li> |
| 484 | <li>Accesses to external variables in C library namespaces.</li> | ||
| 241 | <li><tt>tostring()</tt> for cdata types.</li> | 485 | <li><tt>tostring()</tt> for cdata types.</li> |
| 242 | <li>The following <a href="ext_ffi_api.html">ffi.* API</a> functions: | 486 | <li>The following <a href="ext_ffi_api.html">ffi.* API</a> functions: |
| 243 | <tt>ffi.sizeof()</tt>, <tt>ffi.alignof()</tt>, <tt>ffi.offsetof()</tt>. | 487 | <tt>ffi.sizeof()</tt>, <tt>ffi.alignof()</tt>, <tt>ffi.offsetof()</tt>. |
