1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>FFI Semantics</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Author" content="Mike Pall">
<meta name="Copyright" content="Copyright (C) 2005-2011, Mike Pall">
<meta name="Language" content="en">
<link rel="stylesheet" type="text/css" href="bluequad.css" media="screen">
<link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print">
</head>
<body>
<div id="site">
<a href="http://luajit.org"><span>Lua<span id="logo">JIT</span></span></a>
</div>
<div id="head">
<h1>FFI Semantics</h1>
</div>
<div id="nav">
<ul><li>
<a href="luajit.html">LuaJIT</a>
<ul><li>
<a href="install.html">Installation</a>
</li><li>
<a href="running.html">Running</a>
</li></ul>
</li><li>
<a href="extensions.html">Extensions</a>
<ul><li>
<a href="ext_ffi.html">FFI Library</a>
<ul><li>
<a href="ext_ffi_tutorial.html">FFI Tutorial</a>
</li><li>
<a href="ext_ffi_api.html">ffi.* API</a>
</li><li>
<a href="ext_ffi_int64.html">64 bit Integers</a>
</li><li>
<a class="current" href="ext_ffi_semantics.html">FFI Semantics</a>
</li></ul>
</li><li>
<a href="ext_jit.html">jit.* Library</a>
</li><li>
<a href="ext_c_api.html">Lua/C API</a>
</li></ul>
</li><li>
<a href="status.html">Status</a>
<ul><li>
<a href="changes.html">Changes</a>
</li></ul>
</li><li>
<a href="faq.html">FAQ</a>
</li><li>
<a href="http://luajit.org/performance.html">Performance <span class="ext">»</span></a>
</li><li>
<a href="http://luajit.org/download.html">Download <span class="ext">»</span></a>
</li></ul>
</div>
<div id="main">
<p>
This page describes the detailed semantics underlying the FFI library
and its interaction with both Lua and C code.
</p>
<p>
Given that the FFI library is designed to interface with C code
and that declarations can be written in plain C syntax, it
closely follows the C language semantics wherever possible. Some
concessions are needed for smoother interoperation with Lua language
semantics. But it should be straightforward to write applications
using the LuaJIT FFI for developers with a C or C++ background.
</p>
<h2 id="clang">C Language Support</h2>
<p>
The FFI library has a built-in C parser with a minimal memory
footprint. It's used by the <a href="ext_ffi_api.html">ffi.* library
functions</a> to declare C types or external symbols.
</p>
<p>
It's only purpose is to parse C declarations, as found e.g. in
C header files. Although it does evaluate constant expressions,
it's <em>not</em> a C compiler. The body of <tt>inline</tt>
C function definitions is simply ignored.
</p>
<p>
Also, this is <em>not</em> a validating C parser. It expects and
accepts correctly formed C declarations, but it may choose to
ignore bad declarations or show rather generic error messages. If in
doubt, please check the input against your favorite C compiler.
</p>
<p>
The C parser complies to the <b>C99 language standard</b> plus
the following extensions:
</p>
<ul>
<li>C++-style comments (<tt>//</tt>).</li>
<li>The <tt>'\e'</tt> escape in character and string literals.</li>
<li>The <tt>long long</tt> 64 bit integer type.</tt>
<li>The C99/C++ boolean type, declared with the keywords <tt>bool</tt>
or <tt>_Bool</tt>.</li>
<li>Complex numbers, declared with the keywords <tt>complex</tt> or
<tt>_Complex</tt>.</li>
<li>Two complex number types: <tt>complex</tt> (aka
<tt>complex double</tt>) and <tt>complex float</tt>.</li>
<li>Vector types, declared with the GCC <tt>mode</tt> or
<tt>vector_size</tt> attribute.</li>
<li>Unnamed ('transparent') <tt>struct</tt>/<tt>union</tt> fields
inside a <tt>struct</tt>/<tt>union</tt>.</li>
<li>Incomplete <tt>enum</tt> declarations, handled like incomplete
<tt>struct</tt> declarations.</li>
<li>Unnamed <tt>enum</tt> fields inside a
<tt>struct</tt>/<tt>union</tt>. This is similar to a scoped C++
<tt>enum</tt>, except that declared constants are visible in the
global namespace, too.</li>
<li>C++-style scoped <tt>static const</tt> declarations inside a
<tt>struct</tt>/<tt>union</tt>.</li>
<li>Zero-length arrays (<tt>[0]</tt>), empty
<tt>struct</tt>/<tt>union</tt>, variable-length arrays (VLA,
<tt>[?]</tt>) and variable-length structs (VLS, with a trailing
VLA).</li>
<li>Alternate GCC keywords with '<tt>__</tt>', e.g.
<tt>__const__</tt>.</li>
<li>GCC <tt>__attribute__</tt> with the following attributes:
<tt>aligned</tt>, <tt>packed</tt>, <tt>mode</tt>,
<tt>vector_size</tt>, <tt>cdecl</tt>, <tt>fastcall</tt>,
<tt>stdcall</tt>.</li>
<li>The GCC <tt>__extension__</tt> keyword and the GCC
<tt>__alignof__</tt> operator.</li>
<li>GCC <tt>__asm__("symname")</tt> symbol name redirection for
function declarations.</tt>
<li>MSVC keywords for fixed-length types: <tt>__int8</tt>,
<tt>__int16</tt>, <tt>__int32</tt> and <tt>__int64</tt>.</li>
<li>MSVC <tt>__cdecl</tt>, <tt>__fastcall</tt>, <tt>__stdcall</tt>,
<tt>__ptr32</tt>, <tt>__ptr64</tt>, <tt>__declspec(align(n))</tt>
and <tt>#pragma pack</tt>.</li>
<li>All other GCC/MSVC-specific attributes are ignored.</li>
</ul>
<p>
The following C types are pre-defined by the C parser (like
a <tt>typedef</tt>, except re-declarations will be ignored):
</p>
<ul>
<li>Vararg handling: <tt>va_list</tt>, <tt>__builtin_va_list</tt>,
<tt>__gnuc_va_list</tt>.</li>
<li>From <tt><stddef.h></tt>: <tt>ptrdiff_t</tt>,
<tt>size_t</tt>, <tt>wchar_t</tt>.</li>
<li>From <tt><stdint.h></tt>: <tt>int8_t</tt>, <tt>int16_t</tt>,
<tt>int32_t</tt>, <tt>int64_t</tt>, <tt>uint8_t</tt>,
<tt>uint16_t</tt>, <tt>uint32_t</tt>, <tt>uint64_t</tt>,
<tt>intptr_t</tt>, <tt>uintptr_t</tt>.</li>
</ul>
<p>
You're encouraged to use these types in preference to the
compiler-specific extensions or the target-dependent standard types.
E.g. <tt>char</tt> differs in signedness and <tt>long</tt> differs in
size, depending on the target architecture and platform ABI.
</p>
<p>
The following C features are <b>not</b> supported:
</p>
<ul>
<li>A declaration must always have a type specifier; it doesn't
default to an <tt>int</tt> type.</li>
<li>Old-style empty function declarations (K&R) are not allowed.
All C functions must have a proper protype declaration. A
function declared without parameters (<tt>int foo();</tt>) is
treated as a function taking zero arguments, like in C++.</li>
<li>The <tt>long double</tt> C type is parsed correctly, but
there's no support for the related conversions, accesses or arithmetic
operations.</li>
<li>Wide character strings and character literals are not
supported.</li>
<li><a href="#status">See below</a> for features that are currently
not implemented.</li>
</ul>
<h2 id="convert">C Type Conversion Rules</h2>
<p>
TODO
</p>
<h3 id="convert_tolua">Conversions from C types to Lua objects</h2>
<h3 id="convert_fromlua">Conversions from Lua objects to C types</h2>
<h3 id="convert_between">Conversions between C types</h2>
<h2 id="init">Initializers</h2>
<p>
Creating a cdata object with <a href="ffi_ext_api.html#ffi_new">ffi.new()</a>
or the equivalent constructor syntax always initializes its contents,
too. Different rules apply, depending on the number of optional
initializers and the C types involved:
</p>
<ul>
<li>If no initializers are given, the object is filled with zero bytes.</li>
<li>Scalar types (numbers and pointers) accept a single initializer.
The Lua object is <a href="#convert_fromlua">converted to the scalar
C type</a>.</li>
<li>Valarrays (complex numbers and vectors) are treated like scalars
when a single initializer is given. Otherwise they are treated like
regular arrays.</li>
<li>Aggregate types (arrays and structs) accept either a single
compound initializer (Lua table or string) or a flat list of
initializers.</li>
<li>The elements of an array are initialized, starting at index zero.
If a single initializer is given for an array, it's repeated for all
remaining elements. This doesn't happen if two or more initializers
are given: all remaining uninitialized elements are filled with zero
bytes.</li>
<li>The fields of a <tt>struct</tt> are initialized in the order of
their declaration. Uninitialized fields are filled with zero
bytes.</li>
<li>Only the first field of a <tt>union</tt> can be initialized with a
flat initializer.</li>
<li>Elements or fields which are aggregates themselves are initialized
with a <em>single</em> initializer, but this may be a compound
initializer or a compatible aggregate, of course.</li>
</ul>
<h2 id="ops">Operations on cdata Objects</h2>
<p>
TODO
</p>
<h2 id="gc">Garbage Collection of cdata Objects</h2>
<p>
All explicitly (<tt>ffi.new()</tt>, <tt>ffi.cast()</tt> etc.) or
implicitly (accessors) created cdata objects are garbage collected.
You need to ensure to retain valid references to cdata objects
somewhere on a Lua stack, an upvalue or in a Lua table while they are
still in use. Once the last reference to a cdata object is gone, the
garbage collector will automatically free the memory used by it (at
the end of the next GC cycle).
</p>
<p>
Please note that pointers themselves are cdata objects, however they
are <b>not</b> followed by the garbage collector. So e.g. if you
assign a cdata array to a pointer, you must keep the cdata object
holding the array alive as long as the pointer is still in use:
</p>
<pre class="code">
ffi.cdef[[
typedef struct { int *a; } foo_t;
]]
local s = ffi.new("foo_t", ffi.new("int[10]")) -- <span style="color:#c00000;">WRONG!</span>
local a = ffi.new("int[10]") -- <span style="color:#00a000;">OK</span>
local s = ffi.new("foo_t", a)
-- Now do something with 's', but keep 'a' alive until you're done.
</pre>
<p>
Similar rules apply for Lua strings which are implicitly converted to
<tt>"const char *"</tt>: the string object itself must be
referenced somewhere or it'll be garbage collected eventually. The
pointer will then point to stale data, which may have already beeen
overwritten. Note that <em>string literals</em> are automatically kept
alive as long as the function containing it (actually its prototype)
is not garbage collected.
</p>
<p>
Objects which are passed as an argument to an external C function
are kept alive until the call returns. So it's generally safe to
create temporary cdata objects in argument lists. This is a common
idiom for passing specific C types to vararg functions:
</p>
<pre class="code">
ffi.cdef[[
int printf(const char *fmt, ...);
]]
ffi.C.printf("integer value: %d\n", ffi.new("int", x)) -- <span style="color:#00a000;">OK</span>
</pre>
<p>
Memory areas returned by C functions (e.g. from <tt>malloc()</tt>)
must be manually managed, of course. Pointers to cdata objects are
indistinguishable from pointers returned by C functions (which is one
of the reasons why the GC cannot follow them).
</p>
<h2 id="clib">C Library Namespaces</h2>
<p>
A C library namespace is a special kind of object which allows
access to the symbols contained in shared libraries or the default
symbol namespace. The default
<a href="ext_ffi_api.html#ffi_C"><tt>ffi.C</tt></a> namespace is
automatically created when the FFI library is loaded. C library
namespaces for specific shared libraries may be created with the
<a href="ext_ffi_api.html#ffi_load"><tt>ffi.load()</tt></a> API
function.
</p>
<p>
Indexing a C library namespace object with a symbol name (a Lua
string) automatically binds it to the library. First the symbol type
is resolved — it must have been declared with
<a href="ext_ffi_api.html#ffi_cdef"><tt>ffi.cdef</tt></a>. Then the
symbol address is resolved by searching for the symbol name in the
associated shared libraries or the default symbol namespace. Finally,
the resulting binding between the symbol name, the symbol type and its
address is cached. Missing symbol declarations or nonexistent symbol
names cause an error.
</p>
<p>
This is what happens on a <b>read access</b> for the different kinds of
symbols:
</p>
<ul>
<li>External functions: a cdata object with the type of the function
and its address is returned.</li>
<li>External variables: the symbol address is dereferenced and the
loaded value is <a href="#convert_tolua">converted to a Lua object</a>
and returned.</li>
<li>Constant values (<tt>static const</tt> or <tt>enum</tt>
constants): the constant is <a href="#convert_tolua">converted to a
Lua object</a> and returned.</li>
</ul>
<p>
This is what happens on a <b>write access</b>:
</p>
<ul>
<li>External variables: the value to be written is
<a href="#convert_fromlua">converted to the C type</a> of the
variable and then stored at the symbol address.</li>
<li>Writing to constant variables or to any other symbol type causes
an error, like any other attempted write to a constant location.</li>
</ul>
<p>
C library namespaces themselves are garbage collected objects. If
the last reference to the namespace object is gone, the garbage
collector will eventually release the shared library reference and
remove all memory associated with the namespace. Since this may
trigger the removal of the shared library from the memory of the
running process, it's generally <em>not safe</em> to use function
cdata objects obtained from a library if the namespace object may be
unreferenced.
</p>
<p>
Performance notice: the JIT compiler specializes to the identity of
namespace objects and to the strings used to index it. This
effectively turns function cdata objects into constants. It's not
useful and actually counter-productive to explicitly cache these
function objects, e.g. <tt>local strlen = ffi.C.strlen</tt>. OTOH it
<em>is</em> useful to cache the namespace itself, e.g. <tt>local C =
ffi.C</tt>.
</p>
<h2 id="policy">No Hand-holding!</h2>
<p>
The FFI library has been designed as <b>a low-level library</b>. The
goal is to interface with C code and C data types with a
minimum of overhead. This means <b>you can do anything you can do
from C</b>: access all memory, overwrite anything in memory, call
machine code at any memory address and so on.
</p>
<p>
The FFI library provides <b>no memory safety</b>, unlike regular Lua
code. It will happily allow you to dereference a <tt>NULL</tt>
pointer, to access arrays out of bounds or to misdeclare
C functions. If you make a mistake, your application might crash,
just like equivalent C code would.
</p>
<p>
This behavior is inevitable, since the goal is to provide full
interoperability with C code. Adding extra safety measures, like
bounds checks, would be futile. There's no way to detect
misdeclarations of C functions, since shared libraries only
provide symbol names, but no type information. Likewise there's no way
to infer the valid range of indexes for a returned pointer.
</p>
<p>
Again: the FFI library is a low-level library. This implies it needs
to be used with care, but it's flexibility and performance often
outweigh this concern. If you're a C or C++ developer, it'll be easy
to apply your existing knowledge. OTOH writing code for the FFI
library is not for the faint of heart and probably shouldn't be the
first exercise for someone with little experience in Lua, C or C++.
</p>
<p>
As a corollary of the above, the FFI library is <b>not safe for use by
untrusted Lua code</b>. If you're sandboxing untrusted Lua code, you
definitely don't want to give this code access to the FFI library or
to <em>any</em> cdata object (except 64 bit integers or complex
numbers). Any properly engineered Lua sandbox needs to provide safety
wrappers for many of the standard Lua library functions —
similar wrappers need to be written for high-level operations on FFI
data types, too.
</p>
<h2 id="status">Current Status</h2>
<p>
The initial release of the FFI library has some limitations and is
missing some features. Most of these will be fixed in future releases.
</p>
<p>
<a href="#clang">C language support</a> is
currently incomplete:
</p>
<ul>
<li>C declarations are not passed through a C pre-processor,
yet.</li>
<li>The C parser is able to evaluate most constant expressions
commonly found in C header files. However it doesn't handle the
full range of C expression semantics and may fail for some
obscure constructs.</li>
<li><tt>static const</tt> declarations only work for integer types
up to 32 bits. Neither declaring string constants nor
floating-point constants is supported.</li>
<li>Packed <tt>struct</tt> bitfields that cross container boundaries
are not implemented.</li>
<li>Native vector types may be defined with the GCC <tt>mode</tt> or
<tt>vector_size</tt> attribute. But no operations other than loading,
storing and initializing them are supported, yet.</li>
<li>The <tt>volatile</tt> type qualifier is currently ignored by
compiled code.</li>
<li><a href="ext_ffi_api.html#ffi_cdef"><tt>ffi.cdef</tt></a> silently
ignores all redeclarations.</li>
</ul>
<p>
The JIT compiler already handles a large subset of all FFI operations.
It automatically falls back to the interpreter for unimplemented
operations (you can check for this with the
<a href="running.html#opt_j"><tt>-jv</tt></a> command line option).
The following operations are currently not compiled and may exhibit
suboptimal performance, especially when used in inner loops:
</p>
<ul>
<li>Array/<tt>struct</tt> copies and bulk initializations.</li>
<li>Bitfield accesses and initializations.</li>
<li>Vector operations.</li>
<li>Lua tables as compound initializers.</li>
<li>Initialization of nested <tt>struct</tt>/<tt>union</tt> types.</li>
<li>Allocations of variable-length arrays or structs.</li>
<li>Allocations of C types with a size > 64 bytes or an
alignment > 8 bytes.</li>
<li>Conversions from <tt>lightuserdata</tt> to <tt>void *</tt>.</li>
<li>Pointer differences for element sizes that are not a power of
two.</li>
<li>Calls to non-cdecl or vararg C functions.</li>
<li>Calls to C functions with aggregates passed or returned by
value.</li>
<li>Calls to C functions with 64 bit arguments or return values
on 32 bit CPUs.</li>
<li>Accesses to external variables in C library namespaces.</li>
<li><tt>tostring()</tt> for cdata types.</li>
<li>The following <a href="ext_ffi_api.html">ffi.* API</a> functions:
<tt>ffi.sizeof()</tt>, <tt>ffi.alignof()</tt>, <tt>ffi.offsetof()</tt>.
</ul>
<p>
Other missing features:
</p>
<ul>
<li>Bit operations for 64 bit types.</li>
<li>Arithmetic for <tt>complex</tt> numbers.</li>
<li>User-defined metamethods for C types.</li>
<li>Callbacks from C code to Lua functions.</li>
<li>Atomic handling of <tt>errno</tt>.</li>
<li>Passing structs by value to vararg C functions.</li>
<li><a href="extensions.html#exceptions">C++ exception interoperability<a/>
does not extend to C functions called via the FFI.</li>
</ul>
<br class="flush">
</div>
<div id="foot">
<hr class="hide">
Copyright © 2005-2011 Mike Pall
<span class="noprint">
·
<a href="contact.html">Contact</a>
</span>
</div>
</body>
</html>
|