diff options
Diffstat (limited to 'doc/ext_buffer.html')
-rw-r--r-- | doc/ext_buffer.html | 695 |
1 files changed, 695 insertions, 0 deletions
diff --git a/doc/ext_buffer.html b/doc/ext_buffer.html new file mode 100644 index 00000000..2a82aa97 --- /dev/null +++ b/doc/ext_buffer.html | |||
@@ -0,0 +1,695 @@ | |||
1 | <!DOCTYPE html> | ||
2 | <html> | ||
3 | <head> | ||
4 | <title>String Buffer Library</title> | ||
5 | <meta charset="utf-8"> | ||
6 | <meta name="Copyright" content="Copyright (C) 2005-2022"> | ||
7 | <meta name="Language" content="en"> | ||
8 | <link rel="stylesheet" type="text/css" href="bluequad.css" media="screen"> | ||
9 | <link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print"> | ||
10 | <style type="text/css"> | ||
11 | .lib { | ||
12 | vertical-align: middle; | ||
13 | margin-left: 5px; | ||
14 | padding: 0 5px; | ||
15 | font-size: 60%; | ||
16 | border-radius: 5px; | ||
17 | background: #c5d5ff; | ||
18 | color: #000; | ||
19 | } | ||
20 | </style> | ||
21 | </head> | ||
22 | <body> | ||
23 | <div id="site"> | ||
24 | <a href="https://luajit.org"><span>Lua<span id="logo">JIT</span></span></a> | ||
25 | </div> | ||
26 | <div id="head"> | ||
27 | <h1>String Buffer Library</h1> | ||
28 | </div> | ||
29 | <div id="nav"> | ||
30 | <ul><li> | ||
31 | <a href="luajit.html">LuaJIT</a> | ||
32 | <ul><li> | ||
33 | <a href="https://luajit.org/download.html">Download <span class="ext">»</span></a> | ||
34 | </li><li> | ||
35 | <a href="install.html">Installation</a> | ||
36 | </li><li> | ||
37 | <a href="running.html">Running</a> | ||
38 | </li></ul> | ||
39 | </li><li> | ||
40 | <a href="extensions.html">Extensions</a> | ||
41 | <ul><li> | ||
42 | <a href="ext_ffi.html">FFI Library</a> | ||
43 | <ul><li> | ||
44 | <a href="ext_ffi_tutorial.html">FFI Tutorial</a> | ||
45 | </li><li> | ||
46 | <a href="ext_ffi_api.html">ffi.* API</a> | ||
47 | </li><li> | ||
48 | <a href="ext_ffi_semantics.html">FFI Semantics</a> | ||
49 | </li></ul> | ||
50 | </li><li> | ||
51 | <a class="current" href="ext_buffer.html">String Buffers</a> | ||
52 | </li><li> | ||
53 | <a href="ext_jit.html">jit.* Library</a> | ||
54 | </li><li> | ||
55 | <a href="ext_c_api.html">Lua/C API</a> | ||
56 | </li><li> | ||
57 | <a href="ext_profiler.html">Profiler</a> | ||
58 | </li></ul> | ||
59 | </li><li> | ||
60 | <a href="status.html">Status</a> | ||
61 | </li><li> | ||
62 | <a href="faq.html">FAQ</a> | ||
63 | </li><li> | ||
64 | <a href="https://luajit.org/list.html">Mailing List <span class="ext">»</span></a> | ||
65 | </li></ul> | ||
66 | </div> | ||
67 | <div id="main"> | ||
68 | <p> | ||
69 | The string buffer library allows <b>high-performance manipulation of | ||
70 | string-like data</b>. | ||
71 | </p> | ||
72 | <p> | ||
73 | Unlike Lua strings, which are constants, string buffers are | ||
74 | <b>mutable</b> sequences of 8-bit (binary-transparent) characters. Data | ||
75 | can be stored, formatted and encoded into a string buffer and later | ||
76 | converted, extracted or decoded. | ||
77 | </p> | ||
78 | <p> | ||
79 | The convenient string buffer API simplifies common string manipulation | ||
80 | tasks, that would otherwise require creating many intermediate strings. | ||
81 | String buffers improve performance by eliminating redundant memory | ||
82 | copies, object creation, string interning and garbage collection | ||
83 | overhead. In conjunction with the FFI library, they allow zero-copy | ||
84 | operations. | ||
85 | </p> | ||
86 | <p> | ||
87 | The string buffer library also includes a high-performance | ||
88 | <a href="serialize">serializer</a> for Lua objects. | ||
89 | </p> | ||
90 | |||
91 | <h2 id="wip" style="color:#ff0000">Work in Progress</h2> | ||
92 | <p> | ||
93 | <b style="color:#ff0000">This library is a work in progress. More | ||
94 | functionality will be added soon.</b> | ||
95 | </p> | ||
96 | |||
97 | <h2 id="use">Using the String Buffer Library</h2> | ||
98 | <p> | ||
99 | The string buffer library is built into LuaJIT by default, but it's not | ||
100 | loaded by default. Add this to the start of every Lua file that needs | ||
101 | one of its functions: | ||
102 | </p> | ||
103 | <pre class="code"> | ||
104 | local buffer = require("string.buffer") | ||
105 | </pre> | ||
106 | <p> | ||
107 | The convention for the syntax shown on this page is that <tt>buffer</tt> | ||
108 | refers to the buffer library and <tt>buf</tt> refers to an individual | ||
109 | buffer object. | ||
110 | </p> | ||
111 | <p> | ||
112 | Please note the difference between a Lua function call, e.g. | ||
113 | <tt>buffer.new()</tt> (with a dot) and a Lua method call, e.g. | ||
114 | <tt>buf:reset()</tt> (with a colon). | ||
115 | </p> | ||
116 | |||
117 | <h3 id="buffer_object">Buffer Objects</h3> | ||
118 | <p> | ||
119 | A buffer object is a garbage-collected Lua object. After creation with | ||
120 | <tt>buffer.new()</tt>, it can (and should) be reused for many operations. | ||
121 | When the last reference to a buffer object is gone, it will eventually | ||
122 | be freed by the garbage collector, along with the allocated buffer | ||
123 | space. | ||
124 | </p> | ||
125 | <p> | ||
126 | Buffers operate like a FIFO (first-in first-out) data structure. Data | ||
127 | can be appended (written) to the end of the buffer and consumed (read) | ||
128 | from the front of the buffer. These operations may be freely mixed. | ||
129 | </p> | ||
130 | <p> | ||
131 | The buffer space that holds the characters is managed automatically | ||
132 | — it grows as needed and already consumed space is recycled. Use | ||
133 | <tt>buffer.new(size)</tt> and <tt>buf:free()</tt>, if you need more | ||
134 | control. | ||
135 | </p> | ||
136 | <p> | ||
137 | The maximum size of a single buffer is the same as the maximum size of a | ||
138 | Lua string, which is slightly below two gigabytes. For huge data sizes, | ||
139 | neither strings nor buffers are the right data structure — use the | ||
140 | FFI library to directly map memory or files up to the virtual memory | ||
141 | limit of your OS. | ||
142 | </p> | ||
143 | |||
144 | <h3 id="buffer_overview">Buffer Method Overview</h3> | ||
145 | <ul> | ||
146 | <li> | ||
147 | The <tt>buf:put*()</tt>-like methods append (write) characters to the | ||
148 | end of the buffer. | ||
149 | </li> | ||
150 | <li> | ||
151 | The <tt>buf:get*()</tt>-like methods consume (read) characters from the | ||
152 | front of the buffer. | ||
153 | </li> | ||
154 | <li> | ||
155 | Other methods, like <tt>buf:tostring()</tt> only read the buffer | ||
156 | contents, but don't change the buffer. | ||
157 | </li> | ||
158 | <li> | ||
159 | The <tt>buf:set()</tt> method allows zero-copy consumption of a string | ||
160 | or an FFI cdata object as a buffer. | ||
161 | </li> | ||
162 | <li> | ||
163 | The FFI-specific methods allow zero-copy read/write-style operations or | ||
164 | modifying the buffer contents in-place. Please check the | ||
165 | <a href="#ffi_caveats">FFI caveats</a> below, too. | ||
166 | </li> | ||
167 | <li> | ||
168 | Methods that don't need to return anything specific, return the buffer | ||
169 | object itself as a convenience. This allows method chaining, e.g.: | ||
170 | <tt>buf:reset():encode(obj)</tt> or <tt>buf:skip(len):get()</tt> | ||
171 | </li> | ||
172 | </ul> | ||
173 | |||
174 | <h2 id="create">Buffer Creation and Management</h2> | ||
175 | |||
176 | <h3 id="buffer_new"><tt>local buf = buffer.new([size [,options]])<br> | ||
177 | local buf = buffer.new([options])</tt></h3> | ||
178 | <p> | ||
179 | Creates a new buffer object. | ||
180 | </p> | ||
181 | <p> | ||
182 | The optional <tt>size</tt> argument ensures a minimum initial buffer | ||
183 | size. This is strictly an optimization when the required buffer size is | ||
184 | known beforehand. The buffer space will grow as needed, in any case. | ||
185 | </p> | ||
186 | <p> | ||
187 | The optional table <tt>options</tt> sets various | ||
188 | <a href="#serialize_options">serialization options</a>. | ||
189 | </p> | ||
190 | |||
191 | <h3 id="buffer_reset"><tt>buf = buf:reset()</tt></h3> | ||
192 | <p> | ||
193 | Reset (empty) the buffer. The allocated buffer space is not freed and | ||
194 | may be reused. | ||
195 | </p> | ||
196 | |||
197 | <h3 id="buffer_free"><tt>buf = buf:free()</tt></h3> | ||
198 | <p> | ||
199 | The buffer space of the buffer object is freed. The object itself | ||
200 | remains intact, empty and may be reused. | ||
201 | </p> | ||
202 | <p> | ||
203 | Note: you normally don't need to use this method. The garbage collector | ||
204 | automatically frees the buffer space, when the buffer object is | ||
205 | collected. Use this method, if you need to free the associated memory | ||
206 | immediately. | ||
207 | </p> | ||
208 | |||
209 | <h2 id="write">Buffer Writers</h2> | ||
210 | |||
211 | <h3 id="buffer_put"><tt>buf = buf:put([str|num|obj] [,…])</tt></h3> | ||
212 | <p> | ||
213 | Appends a string <tt>str</tt>, a number <tt>num</tt> or any object | ||
214 | <tt>obj</tt> with a <tt>__tostring</tt> metamethod to the buffer. | ||
215 | Multiple arguments are appended in the given order. | ||
216 | </p> | ||
217 | <p> | ||
218 | Appending a buffer to a buffer is possible and short-circuited | ||
219 | internally. But it still involves a copy. Better combine the buffer | ||
220 | writes to use a single buffer. | ||
221 | </p> | ||
222 | |||
223 | <h3 id="buffer_putf"><tt>buf = buf:putf(format, …)</tt></h3> | ||
224 | <p> | ||
225 | Appends the formatted arguments to the buffer. The <tt>format</tt> | ||
226 | string supports the same options as <tt>string.format()</tt>. | ||
227 | </p> | ||
228 | |||
229 | <h3 id="buffer_putcdata"><tt>buf = buf:putcdata(cdata, len)</tt><span class="lib">FFI</span></h3> | ||
230 | <p> | ||
231 | Appends the given <tt>len</tt> number of bytes from the memory pointed | ||
232 | to by the FFI <tt>cdata</tt> object to the buffer. The object needs to | ||
233 | be convertible to a (constant) pointer. | ||
234 | </p> | ||
235 | |||
236 | <h3 id="buffer_set"><tt>buf = buf:set(str)<br> | ||
237 | buf = buf:set(cdata, len)</tt><span class="lib">FFI</span></h3> | ||
238 | <p> | ||
239 | This method allows zero-copy consumption of a string or an FFI cdata | ||
240 | object as a buffer. It stores a reference to the passed string | ||
241 | <tt>str</tt> or the FFI <tt>cdata</tt> object in the buffer. Any buffer | ||
242 | space originally allocated is freed. This is <i>not</i> an append | ||
243 | operation, unlike the <tt>buf:put*()</tt> methods. | ||
244 | </p> | ||
245 | <p> | ||
246 | After calling this method, the buffer behaves as if | ||
247 | <tt>buf:free():put(str)</tt> or <tt>buf:free():put(cdata, len)</tt> | ||
248 | had been called. However, the data is only referenced and not copied, as | ||
249 | long as the buffer is only consumed. | ||
250 | </p> | ||
251 | <p> | ||
252 | In case the buffer is written to later on, the referenced data is copied | ||
253 | and the object reference is removed (copy-on-write semantics). | ||
254 | </p> | ||
255 | <p> | ||
256 | The stored reference is an anchor for the garbage collector and keeps the | ||
257 | originally passed string or FFI cdata object alive. | ||
258 | </p> | ||
259 | |||
260 | <h3 id="buffer_reserve"><tt>ptr, len = buf:reserve(size)</tt><span class="lib">FFI</span><br> | ||
261 | <tt>buf = buf:commit(used)</tt><span class="lib">FFI</span></h3> | ||
262 | <p> | ||
263 | The <tt>reserve</tt> method reserves at least <tt>size</tt> bytes of | ||
264 | write space in the buffer. It returns an <tt>uint8_t *</tt> FFI | ||
265 | cdata pointer <tt>ptr</tt> that points to this space. | ||
266 | </p> | ||
267 | <p> | ||
268 | The available length in bytes is returned in <tt>len</tt>. This is at | ||
269 | least <tt>size</tt> bytes, but may be more to facilitate efficient | ||
270 | buffer growth. You can either make use of the additional space or ignore | ||
271 | <tt>len</tt> and only use <tt>size</tt> bytes. | ||
272 | </p> | ||
273 | <p> | ||
274 | The <tt>commit</tt> method appends the <tt>used</tt> bytes of the | ||
275 | previously returned write space to the buffer data. | ||
276 | </p> | ||
277 | <p> | ||
278 | This pair of methods allows zero-copy use of C read-style APIs: | ||
279 | </p> | ||
280 | <pre class="code"> | ||
281 | local MIN_SIZE = 65536 | ||
282 | repeat | ||
283 | local ptr, len = buf:reserve(MIN_SIZE) | ||
284 | local n = C.read(fd, ptr, len) | ||
285 | if n == 0 then break end -- EOF. | ||
286 | if n < 0 then error("read error") end | ||
287 | buf:commit(n) | ||
288 | until false | ||
289 | </pre> | ||
290 | <p> | ||
291 | The reserved write space is <i>not</i> initialized. At least the | ||
292 | <tt>used</tt> bytes <b>must</b> be written to before calling the | ||
293 | <tt>commit</tt> method. There's no need to call the <tt>commit</tt> | ||
294 | method, if nothing is added to the buffer (e.g. on error). | ||
295 | </p> | ||
296 | |||
297 | <h2 id="read">Buffer Readers</h2> | ||
298 | |||
299 | <h3 id="buffer_length"><tt>len = #buf</tt></h3> | ||
300 | <p> | ||
301 | Returns the current length of the buffer data in bytes. | ||
302 | </p> | ||
303 | |||
304 | <h3 id="buffer_concat"><tt>res = str|num|buf .. str|num|buf […]</tt></h3> | ||
305 | <p> | ||
306 | The Lua concatenation operator <tt>..</tt> also accepts buffers, just | ||
307 | like strings or numbers. It always returns a string and not a buffer. | ||
308 | </p> | ||
309 | <p> | ||
310 | Note that although this is supported for convenience, this thwarts one | ||
311 | of the main reasons to use buffers, which is to avoid string | ||
312 | allocations. Rewrite it with <tt>buf:put()</tt> and <tt>buf:get()</tt>. | ||
313 | </p> | ||
314 | <p> | ||
315 | Mixing this with unrelated objects that have a <tt>__concat</tt> | ||
316 | metamethod may not work, since these probably only expect strings. | ||
317 | </p> | ||
318 | |||
319 | <h3 id="buffer_skip"><tt>buf = buf:skip(len)</tt></h3> | ||
320 | <p> | ||
321 | Skips (consumes) <tt>len</tt> bytes from the buffer up to the current | ||
322 | length of the buffer data. | ||
323 | </p> | ||
324 | |||
325 | <h3 id="buffer_get"><tt>str, … = buf:get([len|nil] [,…])</tt></h3> | ||
326 | <p> | ||
327 | Consumes the buffer data and returns one or more strings. If called | ||
328 | without arguments, the whole buffer data is consumed. If called with a | ||
329 | number, up to <tt>len</tt> bytes are consumed. A <tt>nil</tt> argument | ||
330 | consumes the remaining buffer space (this only makes sense as the last | ||
331 | argument). Multiple arguments consume the buffer data in the given | ||
332 | order. | ||
333 | </p> | ||
334 | <p> | ||
335 | Note: a zero length or no remaining buffer data returns an empty string | ||
336 | and not <tt>nil</tt>. | ||
337 | </p> | ||
338 | |||
339 | <h3 id="buffer_tostring"><tt>str = buf:tostring()<br> | ||
340 | str = tostring(buf)</tt></h3> | ||
341 | <p> | ||
342 | Creates a string from the buffer data, but doesn't consume it. The | ||
343 | buffer remains unchanged. | ||
344 | </p> | ||
345 | <p> | ||
346 | Buffer objects also define a <tt>__tostring</tt> metamethod. This means | ||
347 | buffers can be passed to the global <tt>tostring()</tt> function and | ||
348 | many other functions that accept this in place of strings. The important | ||
349 | internal uses in functions like <tt>io.write()</tt> are short-circuited | ||
350 | to avoid the creation of an intermediate string object. | ||
351 | </p> | ||
352 | |||
353 | <h3 id="buffer_ref"><tt>ptr, len = buf:ref()</tt><span class="lib">FFI</span></h3> | ||
354 | <p> | ||
355 | Returns an <tt>uint8_t *</tt> FFI cdata pointer <tt>ptr</tt> that | ||
356 | points to the buffer data. The length of the buffer data in bytes is | ||
357 | returned in <tt>len</tt>. | ||
358 | </p> | ||
359 | <p> | ||
360 | The returned pointer can be directly passed to C functions that expect a | ||
361 | buffer and a length. You can also do bytewise reads | ||
362 | (<tt>local x = ptr[i]</tt>) or writes | ||
363 | (<tt>ptr[i] = 0x40</tt>) of the buffer data. | ||
364 | </p> | ||
365 | <p> | ||
366 | In conjunction with the <tt>skip</tt> method, this allows zero-copy use | ||
367 | of C write-style APIs: | ||
368 | </p> | ||
369 | <pre class="code"> | ||
370 | repeat | ||
371 | local ptr, len = buf:ref() | ||
372 | if len == 0 then break end | ||
373 | local n = C.write(fd, ptr, len) | ||
374 | if n < 0 then error("write error") end | ||
375 | buf:skip(n) | ||
376 | until n >= len | ||
377 | </pre> | ||
378 | <p> | ||
379 | Unlike Lua strings, buffer data is <i>not</i> implicitly | ||
380 | zero-terminated. It's not safe to pass <tt>ptr</tt> to C functions that | ||
381 | expect zero-terminated strings. If you're not using <tt>len</tt>, then | ||
382 | you're doing something wrong. | ||
383 | </p> | ||
384 | |||
385 | <h2 id="serialize">Serialization of Lua Objects</h2> | ||
386 | <p> | ||
387 | The following functions and methods allow <b>high-speed serialization</b> | ||
388 | (encoding) of a Lua object into a string and decoding it back to a Lua | ||
389 | object. This allows convenient storage and transport of <b>structured | ||
390 | data</b>. | ||
391 | </p> | ||
392 | <p> | ||
393 | The encoded data is in an <a href="#serialize_format">internal binary | ||
394 | format</a>. The data can be stored in files, binary-transparent | ||
395 | databases or transmitted to other LuaJIT instances across threads, | ||
396 | processes or networks. | ||
397 | </p> | ||
398 | <p> | ||
399 | Encoding speed can reach up to 1 Gigabyte/second on a modern desktop- or | ||
400 | server-class system, even when serializing many small objects. Decoding | ||
401 | speed is mostly constrained by object creation cost. | ||
402 | </p> | ||
403 | <p> | ||
404 | The serializer handles most Lua types, common FFI number types and | ||
405 | nested structures. Functions, thread objects, other FFI cdata and full | ||
406 | userdata cannot be serialized (yet). | ||
407 | </p> | ||
408 | <p> | ||
409 | The encoder serializes nested structures as trees. Multiple references | ||
410 | to a single object will be stored separately and create distinct objects | ||
411 | after decoding. Circular references cause an error. | ||
412 | </p> | ||
413 | |||
414 | <h3 id="serialize_methods">Serialization Functions and Methods</h3> | ||
415 | |||
416 | <h3 id="buffer_encode"><tt>str = buffer.encode(obj)<br> | ||
417 | buf = buf:encode(obj)</tt></h3> | ||
418 | <p> | ||
419 | Serializes (encodes) the Lua object <tt>obj</tt>. The stand-alone | ||
420 | function returns a string <tt>str</tt>. The buffer method appends the | ||
421 | encoding to the buffer. | ||
422 | </p> | ||
423 | <p> | ||
424 | <tt>obj</tt> can be any of the supported Lua types — it doesn't | ||
425 | need to be a Lua table. | ||
426 | </p> | ||
427 | <p> | ||
428 | This function may throw an error when attempting to serialize | ||
429 | unsupported object types, circular references or deeply nested tables. | ||
430 | </p> | ||
431 | |||
432 | <h3 id="buffer_decode"><tt>obj = buffer.decode(str)<br> | ||
433 | obj = buf:decode()</tt></h3> | ||
434 | <p> | ||
435 | The stand-alone function deserializes (decodes) the string | ||
436 | <tt>str</tt>, the buffer method deserializes one object from the | ||
437 | buffer. Both return a Lua object <tt>obj</tt>. | ||
438 | </p> | ||
439 | <p> | ||
440 | The returned object may be any of the supported Lua types — | ||
441 | even <tt>nil</tt>. | ||
442 | </p> | ||
443 | <p> | ||
444 | This function may throw an error when fed with malformed or incomplete | ||
445 | encoded data. The stand-alone function throws when there's left-over | ||
446 | data after decoding a single top-level object. The buffer method leaves | ||
447 | any left-over data in the buffer. | ||
448 | </p> | ||
449 | <p> | ||
450 | Attempting to deserialize an FFI type will throw an error, if the FFI | ||
451 | library is not built-in or has not been loaded, yet. | ||
452 | </p> | ||
453 | |||
454 | <h3 id="serialize_options">Serialization Options</h3> | ||
455 | <p> | ||
456 | The <tt>options</tt> table passed to <tt>buffer.new()</tt> may contain | ||
457 | the following members (all optional): | ||
458 | </p> | ||
459 | <ul> | ||
460 | <li> | ||
461 | <tt>dict</tt> is a Lua table holding a <b>dictionary of strings</b> that | ||
462 | commonly occur as table keys of objects you are serializing. These keys | ||
463 | are compactly encoded as indexes during serialization. A well-chosen | ||
464 | dictionary saves space and improves serialization performance. | ||
465 | </li> | ||
466 | <li> | ||
467 | <tt>metatable</tt> is a Lua table holding a <b>dictionary of metatables</b> | ||
468 | for the table objects you are serializing. | ||
469 | </li> | ||
470 | </ul> | ||
471 | <p> | ||
472 | <tt>dict</tt> needs to be an array of strings and <tt>metatable</tt> needs | ||
473 | to be an array of tables. Both starting at index 1 and without holes (no | ||
474 | <tt>nil</tt> in between). The tables are anchored in the buffer object and | ||
475 | internally modified into a two-way index (don't do this yourself, just pass | ||
476 | a plain array). The tables must not be modified after they have been passed | ||
477 | to <tt>buffer.new()</tt>. | ||
478 | </p> | ||
479 | <p> | ||
480 | The <tt>dict</tt> and <tt>metatable</tt> tables used by the encoder and | ||
481 | decoder must be the same. Put the most common entries at the front. Extend | ||
482 | at the end to ensure backwards-compatibility — older encodings can | ||
483 | then still be read. You may also set some indexes to <tt>false</tt> to | ||
484 | explicitly drop backwards-compatibility. Old encodings that use these | ||
485 | indexes will throw an error when decoded. | ||
486 | </p> | ||
487 | <p> | ||
488 | Metatables that are not found in the <tt>metatable</tt> dictionary are | ||
489 | ignored when encoding. Decoding returns a table with a <tt>nil</tt> | ||
490 | metatable. | ||
491 | </p> | ||
492 | <p> | ||
493 | Note: parsing and preparation of the options table is somewhat | ||
494 | expensive. Create a buffer object only once and recycle it for multiple | ||
495 | uses. Avoid mixing encoder and decoder buffers, since the | ||
496 | <tt>buf:set()</tt> method frees the already allocated buffer space: | ||
497 | </p> | ||
498 | <pre class="code"> | ||
499 | local options = { | ||
500 | dict = { "commonly", "used", "string", "keys" }, | ||
501 | } | ||
502 | local buf_enc = buffer.new(options) | ||
503 | local buf_dec = buffer.new(options) | ||
504 | |||
505 | local function encode(obj) | ||
506 | return buf_enc:reset():encode(obj):get() | ||
507 | end | ||
508 | |||
509 | local function decode(str) | ||
510 | return buf_dec:set(str):decode() | ||
511 | end | ||
512 | </pre> | ||
513 | |||
514 | <h3 id="serialize_stream">Streaming Serialization</h3> | ||
515 | <p> | ||
516 | In some contexts, it's desirable to do piecewise serialization of large | ||
517 | datasets, also known as <i>streaming</i>. | ||
518 | </p> | ||
519 | <p> | ||
520 | This serialization format can be safely concatenated and supports streaming. | ||
521 | Multiple encodings can simply be appended to a buffer and later decoded | ||
522 | individually: | ||
523 | </p> | ||
524 | <pre class="code"> | ||
525 | local buf = buffer.new() | ||
526 | buf:encode(obj1) | ||
527 | buf:encode(obj2) | ||
528 | local copy1 = buf:decode() | ||
529 | local copy2 = buf:decode() | ||
530 | </pre> | ||
531 | <p> | ||
532 | Here's how to iterate over a stream: | ||
533 | </p> | ||
534 | <pre class="code"> | ||
535 | while #buf ~= 0 do | ||
536 | local obj = buf:decode() | ||
537 | -- Do something with obj. | ||
538 | end | ||
539 | </pre> | ||
540 | <p> | ||
541 | Since the serialization format doesn't prepend a length to its encoding, | ||
542 | network applications may need to transmit the length, too. | ||
543 | </p> | ||
544 | |||
545 | <h3 id="serialize_format">Serialization Format Specification</h3> | ||
546 | <p> | ||
547 | This serialization format is designed for <b>internal use</b> by LuaJIT | ||
548 | applications. Serialized data is upwards-compatible and portable across | ||
549 | all supported LuaJIT platforms. | ||
550 | </p> | ||
551 | <p> | ||
552 | It's an <b>8-bit binary format</b> and not human-readable. It uses e.g. | ||
553 | embedded zeroes and stores embedded Lua string objects unmodified, which | ||
554 | are 8-bit-clean, too. Encoded data can be safely concatenated for | ||
555 | streaming and later decoded one top-level object at a time. | ||
556 | </p> | ||
557 | <p> | ||
558 | The encoding is reasonably compact, but tuned for maximum performance, | ||
559 | not for minimum space usage. It compresses well with any of the common | ||
560 | byte-oriented data compression algorithms. | ||
561 | </p> | ||
562 | <p> | ||
563 | Although documented here for reference, this format is explicitly | ||
564 | <b>not</b> intended to be a 'public standard' for structured data | ||
565 | interchange across computer languages (like JSON or MessagePack). Please | ||
566 | do not use it as such. | ||
567 | </p> | ||
568 | <p> | ||
569 | The specification is given below as a context-free grammar with a | ||
570 | top-level <tt>object</tt> as the starting point. Alternatives are | ||
571 | separated by the <tt>|</tt> symbol and <tt>*</tt> indicates repeats. | ||
572 | Grouping is implicit or indicated by <tt>{…}</tt>. Terminals are | ||
573 | either plain hex numbers, encoded as bytes, or have a <tt>.format</tt> | ||
574 | suffix. | ||
575 | </p> | ||
576 | <pre> | ||
577 | object → nil | false | true | ||
578 | | null | lightud32 | lightud64 | ||
579 | | int | num | tab | tab_mt | ||
580 | | int64 | uint64 | complex | ||
581 | | string | ||
582 | |||
583 | nil → 0x00 | ||
584 | false → 0x01 | ||
585 | true → 0x02 | ||
586 | |||
587 | null → 0x03 // NULL lightuserdata | ||
588 | lightud32 → 0x04 data.I // 32 bit lightuserdata | ||
589 | lightud64 → 0x05 data.L // 64 bit lightuserdata | ||
590 | |||
591 | int → 0x06 int.I // int32_t | ||
592 | num → 0x07 double.L | ||
593 | |||
594 | tab → 0x08 // Empty table | ||
595 | | 0x09 h.U h*{object object} // Key/value hash | ||
596 | | 0x0a a.U a*object // 0-based array | ||
597 | | 0x0b a.U a*object h.U h*{object object} // Mixed | ||
598 | | 0x0c a.U (a-1)*object // 1-based array | ||
599 | | 0x0d a.U (a-1)*object h.U h*{object object} // Mixed | ||
600 | tab_mt → 0x0e (index-1).U tab // Metatable dict entry | ||
601 | |||
602 | int64 → 0x10 int.L // FFI int64_t | ||
603 | uint64 → 0x11 uint.L // FFI uint64_t | ||
604 | complex → 0x12 re.L im.L // FFI complex | ||
605 | |||
606 | string → (0x20+len).U len*char.B | ||
607 | | 0x0f (index-1).U // String dict entry | ||
608 | |||
609 | .B = 8 bit | ||
610 | .I = 32 bit little-endian | ||
611 | .L = 64 bit little-endian | ||
612 | .U = prefix-encoded 32 bit unsigned number n: | ||
613 | 0x00..0xdf → n.B | ||
614 | 0xe0..0x1fdf → (0xe0|(((n-0xe0)>>8)&0x1f)).B ((n-0xe0)&0xff).B | ||
615 | 0x1fe0.. → 0xff n.I | ||
616 | </pre> | ||
617 | |||
618 | <h2 id="error">Error handling</h2> | ||
619 | <p> | ||
620 | Many of the buffer methods can throw an error. Out-of-memory or usage | ||
621 | errors are best caught with an outer wrapper for larger parts of code. | ||
622 | There's not much one can do after that, anyway. | ||
623 | </p> | ||
624 | <p> | ||
625 | OTOH, you may want to catch some errors individually. Buffer methods need | ||
626 | to receive the buffer object as the first argument. The Lua colon-syntax | ||
627 | <tt>obj:method()</tt> does that implicitly. But to wrap a method with | ||
628 | <tt>pcall()</tt>, the arguments need to be passed like this: | ||
629 | </p> | ||
630 | <pre class="code"> | ||
631 | local ok, err = pcall(buf.encode, buf, obj) | ||
632 | if not ok then | ||
633 | -- Handle error in err. | ||
634 | end | ||
635 | </pre> | ||
636 | |||
637 | <h2 id="ffi_caveats">FFI caveats</h2> | ||
638 | <p> | ||
639 | The string buffer library has been designed to work well together with | ||
640 | the FFI library. But due to the low-level nature of the FFI library, | ||
641 | some care needs to be taken: | ||
642 | </p> | ||
643 | <p> | ||
644 | First, please remember that FFI pointers are zero-indexed. The space | ||
645 | returned by <tt>buf:reserve()</tt> and <tt>buf:ref()</tt> starts at the | ||
646 | returned pointer and ends before <tt>len</tt> bytes after that. | ||
647 | </p> | ||
648 | <p> | ||
649 | I.e. the first valid index is <tt>ptr[0]</tt> and the last valid index | ||
650 | is <tt>ptr[len-1]</tt>. If the returned length is zero, there's no valid | ||
651 | index at all. The returned pointer may even be <tt>NULL</tt>. | ||
652 | </p> | ||
653 | <p> | ||
654 | The space pointed to by the returned pointer is only valid as long as | ||
655 | the buffer is not modified in any way (neither append, nor consume, nor | ||
656 | reset, etc.). The pointer is also not a GC anchor for the buffer object | ||
657 | itself. | ||
658 | </p> | ||
659 | <p> | ||
660 | Buffer data is only guaranteed to be byte-aligned. Casting the returned | ||
661 | pointer to a data type with higher alignment may cause unaligned | ||
662 | accesses. It depends on the CPU architecture whether this is allowed or | ||
663 | not (it's always OK on x86/x64 and mostly OK on other modern | ||
664 | architectures). | ||
665 | </p> | ||
666 | <p> | ||
667 | FFI pointers or references do not count as GC anchors for an underlying | ||
668 | object. E.g. an <tt>array</tt> allocated with <tt>ffi.new()</tt> is | ||
669 | anchored by <tt>buf:set(array, len)</tt>, but not by | ||
670 | <tt>buf:set(array+offset, len)</tt>. The addition of the offset | ||
671 | creates a new pointer, even when the offset is zero. In this case, you | ||
672 | need to make sure there's still a reference to the original array as | ||
673 | long as its contents are in use by the buffer. | ||
674 | </p> | ||
675 | <p> | ||
676 | Even though each LuaJIT VM instance is single-threaded (but you can | ||
677 | create multiple VMs), FFI data structures can be accessed concurrently. | ||
678 | Be careful when reading/writing FFI cdata from/to buffers to avoid | ||
679 | concurrent accesses or modifications. In particular, the memory | ||
680 | referenced by <tt>buf:set(cdata, len)</tt> must not be modified | ||
681 | while buffer readers are working on it. Shared, but read-only memory | ||
682 | mappings of files are OK, but only if the file does not change. | ||
683 | </p> | ||
684 | <br class="flush"> | ||
685 | </div> | ||
686 | <div id="foot"> | ||
687 | <hr class="hide"> | ||
688 | Copyright © 2005-2022 | ||
689 | <span class="noprint"> | ||
690 | · | ||
691 | <a href="contact.html">Contact</a> | ||
692 | </span> | ||
693 | </div> | ||
694 | </body> | ||
695 | </html> | ||