diff options
Diffstat (limited to '')
-rw-r--r-- | doc/ext_buffer.html | 457 |
1 files changed, 404 insertions, 53 deletions
diff --git a/doc/ext_buffer.html b/doc/ext_buffer.html index 455c298d..94af757d 100644 --- a/doc/ext_buffer.html +++ b/doc/ext_buffer.html | |||
@@ -1,19 +1,30 @@ | |||
1 | <!DOCTYPE html> | 1 | <!DOCTYPE html> |
2 | <html> | 2 | <html> |
3 | <head> | 3 | <head> |
4 | <title>String Buffers</title> | 4 | <title>String Buffer Library</title> |
5 | <meta charset="utf-8"> | 5 | <meta charset="utf-8"> |
6 | <meta name="Copyright" content="Copyright (C) 2005-2021"> | 6 | <meta name="Copyright" content="Copyright (C) 2005-2021"> |
7 | <meta name="Language" content="en"> | 7 | <meta name="Language" content="en"> |
8 | <link rel="stylesheet" type="text/css" href="bluequad.css" media="screen"> | 8 | <link rel="stylesheet" type="text/css" href="bluequad.css" media="screen"> |
9 | <link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print"> | 9 | <link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print"> |
10 | <style type="text/css"> | ||
11 | .lib { | ||
12 | vertical-align: middle; | ||
13 | margin-left: 5px; | ||
14 | padding: 0 5px; | ||
15 | font-size: 60%; | ||
16 | border-radius: 5px; | ||
17 | background: #c5d5ff; | ||
18 | color: #000; | ||
19 | } | ||
20 | </style> | ||
10 | </head> | 21 | </head> |
11 | <body> | 22 | <body> |
12 | <div id="site"> | 23 | <div id="site"> |
13 | <a href="https://luajit.org"><span>Lua<span id="logo">JIT</span></span></a> | 24 | <a href="https://luajit.org"><span>Lua<span id="logo">JIT</span></span></a> |
14 | </div> | 25 | </div> |
15 | <div id="head"> | 26 | <div id="head"> |
16 | <h1>String Buffers</h1> | 27 | <h1>String Buffer Library</h1> |
17 | </div> | 28 | </div> |
18 | <div id="nav"> | 29 | <div id="nav"> |
19 | <ul><li> | 30 | <ul><li> |
@@ -57,31 +68,35 @@ | |||
57 | </div> | 68 | </div> |
58 | <div id="main"> | 69 | <div id="main"> |
59 | <p> | 70 | <p> |
60 | |||
61 | The string buffer library allows <b>high-performance manipulation of | 71 | The string buffer library allows <b>high-performance manipulation of |
62 | string-like data</b>. | 72 | string-like data</b>. |
63 | |||
64 | </p> | 73 | </p> |
65 | <p> | 74 | <p> |
66 | |||
67 | Unlike Lua strings, which are constants, string buffers are | 75 | Unlike Lua strings, which are constants, string buffers are |
68 | <b>mutable</b> sequences of 8-bit (binary-transparent) characters. Data | 76 | <b>mutable</b> sequences of 8-bit (binary-transparent) characters. Data |
69 | can be stored, formatted and encoded into a string buffer and later | 77 | can be stored, formatted and encoded into a string buffer and later |
70 | converted, decoded or extracted. | 78 | converted, extracted or decoded. |
71 | |||
72 | </p> | 79 | </p> |
73 | <p> | 80 | <p> |
74 | |||
75 | The convenient string buffer API simplifies common string manipulation | 81 | The convenient string buffer API simplifies common string manipulation |
76 | tasks, that would otherwise require creating many intermediate strings. | 82 | tasks, that would otherwise require creating many intermediate strings. |
77 | String buffers improve performance by eliminating redundant memory | 83 | String buffers improve performance by eliminating redundant memory |
78 | copies, object creation, string interning and garbage collection | 84 | copies, object creation, string interning and garbage collection |
79 | overhead. In conjunction with the FFI library, they allow zero-copy | 85 | overhead. In conjunction with the FFI library, they allow zero-copy |
80 | operations. | 86 | operations. |
87 | </p> | ||
88 | <p> | ||
89 | The string buffer libary also includes a high-performance | ||
90 | <a href="serialize">serializer</a> for Lua objects. | ||
91 | </p> | ||
81 | 92 | ||
93 | <h2 id="wip" style="color:#ff0000">Work in Progress</h2> | ||
94 | <p> | ||
95 | <b style="color:#ff0000">This library is a work in progress. More | ||
96 | functionality will be added soon.</b> | ||
82 | </p> | 97 | </p> |
83 | 98 | ||
84 | <h2 id="load">Using the String Buffer Library</h2> | 99 | <h2 id="use">Using the String Buffer Library</h2> |
85 | <p> | 100 | <p> |
86 | The string buffer library is built into LuaJIT by default, but it's not | 101 | The string buffer library is built into LuaJIT by default, but it's not |
87 | loaded by default. Add this to the start of every Lua file that needs | 102 | loaded by default. Add this to the start of every Lua file that needs |
@@ -90,137 +105,406 @@ one of its functions: | |||
90 | <pre class="code"> | 105 | <pre class="code"> |
91 | local buffer = require("string.buffer") | 106 | local buffer = require("string.buffer") |
92 | </pre> | 107 | </pre> |
108 | <p> | ||
109 | The convention for the syntax shown on this page is that <tt>buffer</tt> | ||
110 | refers to the buffer library and <tt>buf</tt> refers to an individual | ||
111 | buffer object. | ||
112 | </p> | ||
113 | <p> | ||
114 | Please note the difference between a Lua function call, e.g. | ||
115 | <tt>buffer.new()</tt> (with a dot) and a Lua method call, e.g. | ||
116 | <tt>buf:reset()</tt> (with a colon). | ||
117 | </p> | ||
93 | 118 | ||
94 | <h2 id="wip" style="color:#ff0000">Work in Progress</h2> | 119 | <h3 id="buffer_object">Buffer Objects</h3> |
120 | <p> | ||
121 | A buffer object is a garbage-collected Lua object. After creation with | ||
122 | <tt>buffer.new()</tt>, it can (and should) be reused for many operations. | ||
123 | When the last reference to a buffer object is gone, it will eventually | ||
124 | be freed by the garbage collector, along with the allocated buffer | ||
125 | space. | ||
126 | </p> | ||
127 | <p> | ||
128 | Buffers operate like a FIFO (first-in first-out) data structure. Data | ||
129 | can be appended (written) to the end of the buffer and consumed (read) | ||
130 | from the front of the buffer. These operations can be freely mixed. | ||
131 | </p> | ||
132 | <p> | ||
133 | The buffer space that holds the characters is managed automatically | ||
134 | — it grows as needed and already consumed space is recycled. Use | ||
135 | <tt>buffer.new(size)</tt> and <tt>buf:free()</tt>, if you need more | ||
136 | control. | ||
137 | </p> | ||
138 | <p> | ||
139 | The maximum size of a single buffer is the same as the maximum size of a | ||
140 | Lua string, which is slightly below two gigabytes. For huge data sizes, | ||
141 | neither strings nor buffers are the right data structure — use the | ||
142 | FFI library to directly map memory or files up to the virtual memory | ||
143 | limit of your OS. | ||
144 | </p> | ||
95 | 145 | ||
146 | <h3 id="buffer_overview">Buffer Method Overview</h3> | ||
147 | <ul> | ||
148 | <li> | ||
149 | The <tt>buf:put*()</tt>-like methods append (write) characters to the | ||
150 | end of the buffer. | ||
151 | </li> | ||
152 | <li> | ||
153 | The <tt>buf:get*()</tt>-like methods consume (read) characters from the | ||
154 | front of the buffer. | ||
155 | </li> | ||
156 | <li> | ||
157 | Other methods, like <tt>buf:tostring()</tt> only read the buffer | ||
158 | contents, but don't change the buffer. | ||
159 | </li> | ||
160 | <li> | ||
161 | The <tt>buf:set()</tt> method allows zero-copy consumption of a string | ||
162 | or an FFI cdata object as a buffer. | ||
163 | </li> | ||
164 | <li> | ||
165 | The FFI-specific methods allow zero-copy read/write-style operations or | ||
166 | modifying the buffer contents in-place. Please check the | ||
167 | <a href="#ffi_caveats">FFI caveats</a> below, too. | ||
168 | </li> | ||
169 | <li> | ||
170 | Methods that don't need to return anything specific, return the buffer | ||
171 | object itself as a convenience. This allows method chaining, e.g.: | ||
172 | <tt>buf:reset():encode(obj)</tt> or <tt>buf:skip(len):get()</tt> | ||
173 | </li> | ||
174 | </ul> | ||
175 | |||
176 | <h2 id="create">Buffer Creation and Management</h2> | ||
177 | |||
178 | <h3 id="buffer_new"><tt>local buf = buffer.new([size])</tt></h3> | ||
179 | <p> | ||
180 | Creates a new buffer object. | ||
181 | </p> | ||
96 | <p> | 182 | <p> |
183 | The optional <tt>size</tt> argument ensures a minimum initial buffer | ||
184 | size. This is strictly an optimization for cases where the required | ||
185 | buffer size is known beforehand. | ||
186 | </p> | ||
97 | 187 | ||
98 | <b style="color:#ff0000">This library is a work in progress. More | 188 | <h3 id="buffer_reset"><tt>buf = buf:reset()</tt></h3> |
99 | functions will be added soon.</b> | 189 | <p> |
190 | Reset (empty) the buffer. The allocated buffer space is not freed and | ||
191 | may be reused. | ||
192 | </p> | ||
100 | 193 | ||
194 | <h3 id="buffer_free"><tt>buf = buf:free()</tt></h3> | ||
195 | <p> | ||
196 | The buffer space of the buffer object is freed. The object itself | ||
197 | remains intact, empty and it may be reused. | ||
198 | </p> | ||
199 | <p> | ||
200 | Note: you normally don't need to use this method. The garbage collector | ||
201 | automatically frees the buffer space, when the buffer object is | ||
202 | collected. Use this method, if you need to free the associated memory | ||
203 | immediately. | ||
101 | </p> | 204 | </p> |
102 | 205 | ||
103 | <h2 id="serialize">Serialization of Lua Objects</h2> | 206 | <h2 id="write">Buffer Writers</h2> |
207 | |||
208 | <h3 id="buffer_put"><tt>buf = buf:put([str|num|obj] [, ...])</tt></h3> | ||
209 | <p> | ||
210 | Appends a string <tt>str</tt>, a number <tt>num</tt> or any object | ||
211 | <tt>obj</tt> with a <tt>__tostring</tt> metamethod to the buffer. | ||
212 | Multiple arguments are appended in the given order. | ||
213 | </p> | ||
214 | <p> | ||
215 | Appending a buffer to a buffer is possible and short-circuited | ||
216 | internally. But it still involves a copy. Better combine the buffer | ||
217 | writes to use a single buffer. | ||
218 | </p> | ||
219 | |||
220 | <h3 id="buffer_putf"><tt>buf = buf:putf(format, ...)</tt></h3> | ||
221 | <p> | ||
222 | Appends the formatted arguments to the buffer. The <tt>format</tt> | ||
223 | string supports the same options as <tt>string.format()</tt>. | ||
224 | </p> | ||
225 | |||
226 | <h3 id="buffer_putcdata"><tt>buf = buf:putcdata(cdata, len)</tt><span class="lib">FFI</span></h3> | ||
104 | <p> | 227 | <p> |
228 | Appends the given <tt>len</tt> number of bytes from the memory pointed | ||
229 | to by the FFI <tt>cdata</tt> object to the buffer. The object needs to | ||
230 | be convertible to a (constant) pointer. | ||
231 | </p> | ||
232 | |||
233 | <h3 id="buffer_set"><tt>buf = buf:set(str)<br> | ||
234 | buf = buf:set(cdata, len)</tt><span class="lib">FFI</span></h3> | ||
235 | <p> | ||
236 | This method allows zero-copy consumption of a string or an FFI cdata | ||
237 | object as a buffer. It stores a reference to the passed string | ||
238 | <tt>str</tt> or the FFI <tt>cdata</tt> object in the buffer. Any buffer | ||
239 | space originally allocated is freed. This is <i>not</i> an append | ||
240 | operation, unlike the <tt>buf:put*()</tt> methods. | ||
241 | </p> | ||
242 | <p> | ||
243 | After calling this method, the buffer behaves as if | ||
244 | <tt>buf:free():put(str)</tt> or <tt>buf:free():put(cdata, len)</tt> | ||
245 | had been called. However, the data is only referenced and not copied, as | ||
246 | long as the buffer is only consumed. | ||
247 | </p> | ||
248 | <p> | ||
249 | In case the buffer is written to later on, the referenced data is copied | ||
250 | and the object reference is removed (copy-on-write semantics). | ||
251 | </p> | ||
252 | <p> | ||
253 | The stored reference is an anchor for the garbage collector and keeps the | ||
254 | originally passed string or FFI cdata object alive. | ||
255 | </p> | ||
256 | |||
257 | <h3 id="buffer_reserve"><tt>ptr, len = buf:reserve(size)</tt><span class="lib">FFI</span><br> | ||
258 | <tt>buf = buf:commit(used)</tt><span class="lib">FFI</span></h3> | ||
259 | <p> | ||
260 | The <tt>reserve</tt> method reserves at least <tt>size</tt> bytes of | ||
261 | write space in the buffer. It returns an <tt>uint8_t *</tt> FFI | ||
262 | cdata pointer <tt>ptr</tt> that points to this space. | ||
263 | </p> | ||
264 | <p> | ||
265 | The available length in bytes is returned in <tt>len</tt>. This is at | ||
266 | least <tt>size</tt> bytes, but may be more to facilitate efficient | ||
267 | buffer growth. You can either make use of the additional space or ignore | ||
268 | <tt>len</tt> and only use <tt>size</tt> bytes. | ||
269 | </p> | ||
270 | <p> | ||
271 | The <tt>commit</tt> method appends the <tt>used</tt> bytes of the | ||
272 | previously returned write space to the buffer data. | ||
273 | </p> | ||
274 | <p> | ||
275 | This pair of methods allows zero-copy use of C read-style APIs: | ||
276 | </p> | ||
277 | <pre class="code"> | ||
278 | local MIN_SIZE = 65536 | ||
279 | repeat | ||
280 | local ptr, len = buf:reserve(MIN_SIZE) | ||
281 | local n = C.read(fd, ptr, len) | ||
282 | if n == 0 then break end -- EOF. | ||
283 | if n < 0 then error("read error") end | ||
284 | buf:commit(n) | ||
285 | until false | ||
286 | </pre> | ||
287 | <p> | ||
288 | The reserved write space is <i>not</i> initialized. At least the | ||
289 | <tt>used</tt> bytes <b>must</b> be written to before calling the | ||
290 | <tt>commit</tt> method. There's no need to call the <tt>commit</tt> | ||
291 | method, if nothing is added to the buffer (e.g. on error). | ||
292 | </p> | ||
293 | |||
294 | <h2 id="read">Buffer Readers</h2> | ||
295 | |||
296 | <h3 id="buffer_length"><tt>len = #buf</tt></h3> | ||
297 | <p> | ||
298 | Returns the current length of the buffer data in bytes. | ||
299 | </p> | ||
300 | |||
301 | <h3 id="buffer_concat"><tt>res = str|num|buf .. str|num|buf [...]</tt></h3> | ||
302 | <p> | ||
303 | The Lua concatenation operator <tt>..</tt> also accepts buffers, just | ||
304 | like strings or numbers. It always returns a string and not a buffer. | ||
305 | </p> | ||
306 | <p> | ||
307 | Note that although this is supported for convenience, this thwarts one | ||
308 | of the main reasons to use buffers, which is to avoid string | ||
309 | allocations. Rewrite it with <tt>buf:put()</tt> and <tt>buf:get()</tt>. | ||
310 | </p> | ||
311 | <p> | ||
312 | Mixing this with unrelated objects that have a <tt>__concat</tt> | ||
313 | metamethod may not work, since these probably only expect strings. | ||
314 | </p> | ||
315 | |||
316 | <h3 id="buffer_skip"><tt>buf = buf:skip(len)</tt></h3> | ||
317 | <p> | ||
318 | Skips (consumes) <tt>len</tt> bytes from the buffer up to the current | ||
319 | length of the buffer data. | ||
320 | </p> | ||
321 | |||
322 | <h3 id="buffer_get"><tt>str, ... = buf:get([len|nil] [,...])</tt></h3> | ||
323 | <p> | ||
324 | Consumes the buffer data and returns one or more strings. If called | ||
325 | without arguments, the whole buffer data is consumed. If called with a | ||
326 | number, up to <tt>len</tt> bytes are consumed. A <tt>nil</tt> argument | ||
327 | consumes the remaining buffer space (this only makes sense as the last | ||
328 | argument). Multiple arguments consume the buffer data in the given | ||
329 | order. | ||
330 | </p> | ||
331 | <p> | ||
332 | Note: a zero length or no remaining buffer data returns an empty string | ||
333 | and not <tt>nil</tt>. | ||
334 | </p> | ||
105 | 335 | ||
336 | <h3 id="buffer_tostring"><tt>str = buf:tostring()<br> | ||
337 | str = tostring(buf)</tt></h3> | ||
338 | <p> | ||
339 | Creates a string from the buffer data, but doesn't consume it. The | ||
340 | buffer remains unchanged. | ||
341 | </p> | ||
342 | <p> | ||
343 | Buffer objects also define a <tt>__tostring</tt> metamethod. This means | ||
344 | buffers can be passed to the global <tt>tostring()</tt> function and | ||
345 | many other functions that accept this in place of strings. The important | ||
346 | internal uses in functions like <tt>io.write()</tt> are short-circuited | ||
347 | to avoid the creation of an intermediate string object. | ||
348 | </p> | ||
349 | |||
350 | <h3 id="buffer_ref"><tt>ptr, len = buf:ref()</tt><span class="lib">FFI</span></h3> | ||
351 | <p> | ||
352 | Returns an <tt>uint8_t *</tt> FFI cdata pointer <tt>ptr</tt> that | ||
353 | points to the buffer data. The length of the buffer data in bytes is | ||
354 | returned in <tt>len</tt>. | ||
355 | </p> | ||
356 | <p> | ||
357 | The returned pointer can be directly passed to C functions that expect a | ||
358 | buffer and a length. You can also do bytewise reads | ||
359 | (<tt>local x = ptr[i]</tt>) or writes | ||
360 | (<tt>ptr[i] = 0x40</tt>) of the buffer data. | ||
361 | </p> | ||
362 | <p> | ||
363 | In conjunction with the <tt>skip</tt> method, this allows zero-copy use | ||
364 | of C write-style APIs: | ||
365 | </p> | ||
366 | <pre class="code"> | ||
367 | repeat | ||
368 | local ptr, len = buf:ref() | ||
369 | if len == 0 then break end | ||
370 | local n = C.write(fd, ptr, len) | ||
371 | if n < 0 then error("write error") end | ||
372 | buf:skip(n) | ||
373 | until n >= len | ||
374 | </pre> | ||
375 | <p> | ||
376 | Unlike Lua strings, buffer data is <i>not</i> implicitly | ||
377 | zero-terminated. It's not safe to pass <tt>ptr</tt> to C functions that | ||
378 | expect zero-terminated strings. If you're not using <tt>len</tt>, then | ||
379 | you're doing something wrong. | ||
380 | </p> | ||
381 | |||
382 | <h2 id="serialize">Serialization of Lua Objects</h2> | ||
383 | <p> | ||
106 | The following functions and methods allow <b>high-speed serialization</b> | 384 | The following functions and methods allow <b>high-speed serialization</b> |
107 | (encoding) of a Lua object into a string and decoding it back to a Lua | 385 | (encoding) of a Lua object into a string and decoding it back to a Lua |
108 | object. This allows convenient storage and transport of <b>structured | 386 | object. This allows convenient storage and transport of <b>structured |
109 | data</b>. | 387 | data</b>. |
110 | |||
111 | </p> | 388 | </p> |
112 | <p> | 389 | <p> |
113 | |||
114 | The encoded data is in an <a href="#serialize_format">internal binary | 390 | The encoded data is in an <a href="#serialize_format">internal binary |
115 | format</a>. The data can be stored in files, binary-transparent | 391 | format</a>. The data can be stored in files, binary-transparent |
116 | databases or transmitted to other LuaJIT instances across threads, | 392 | databases or transmitted to other LuaJIT instances across threads, |
117 | processes or networks. | 393 | processes or networks. |
118 | |||
119 | </p> | 394 | </p> |
120 | <p> | 395 | <p> |
121 | |||
122 | Encoding speed can reach up to 1 Gigabyte/second on a modern desktop- or | 396 | Encoding speed can reach up to 1 Gigabyte/second on a modern desktop- or |
123 | server-class system, even when serializing many small objects. Decoding | 397 | server-class system, even when serializing many small objects. Decoding |
124 | speed is mostly constrained by object creation cost. | 398 | speed is mostly constrained by object creation cost. |
125 | |||
126 | </p> | 399 | </p> |
127 | <p> | 400 | <p> |
128 | |||
129 | The serializer handles most Lua types, common FFI number types and | 401 | The serializer handles most Lua types, common FFI number types and |
130 | nested structures. Functions, thread objects, other FFI cdata, full | 402 | nested structures. Functions, thread objects, other FFI cdata, full |
131 | userdata and associated metatables cannot be serialized (yet). | 403 | userdata and associated metatables cannot be serialized (yet). |
132 | |||
133 | </p> | 404 | </p> |
134 | <p> | 405 | <p> |
135 | |||
136 | The encoder serializes nested structures as trees. Multiple references | 406 | The encoder serializes nested structures as trees. Multiple references |
137 | to a single object will be stored separately and create distinct objects | 407 | to a single object will be stored separately and create distinct objects |
138 | after decoding. Circular references cause an error. | 408 | after decoding. Circular references cause an error. |
139 | |||
140 | |||
141 | </p> | 409 | </p> |
142 | 410 | ||
143 | <h3 id="buffer_encode"><tt>str = buffer.encode(obj)</tt></h3> | 411 | <h3 id="serialize_methods">Serialization Functions and Methods</h3> |
144 | <p> | ||
145 | |||
146 | Serializes (encodes) the Lua object <tt>obj</tt> into the string | ||
147 | <tt>str</tt>. | ||
148 | 412 | ||
413 | <h3 id="buffer_encode"><tt>str = buffer.encode(obj)<br> | ||
414 | buf = buf:encode(obj)</tt></h3> | ||
415 | <p> | ||
416 | Serializes (encodes) the Lua object <tt>obj</tt>. The stand-alone | ||
417 | function returns a string <tt>str</tt>. The buffer method appends the | ||
418 | encoding to the buffer. | ||
149 | </p> | 419 | </p> |
150 | <p> | 420 | <p> |
151 | |||
152 | <tt>obj</tt> can be any of the supported Lua types — it doesn't | 421 | <tt>obj</tt> can be any of the supported Lua types — it doesn't |
153 | need to be a Lua table. | 422 | need to be a Lua table. |
154 | |||
155 | </p> | 423 | </p> |
156 | <p> | 424 | <p> |
157 | |||
158 | This function may throw an error when attempting to serialize | 425 | This function may throw an error when attempting to serialize |
159 | unsupported object types, circular references or deeply nested tables. | 426 | unsupported object types, circular references or deeply nested tables. |
160 | |||
161 | </p> | 427 | </p> |
162 | 428 | ||
163 | <h3 id="buffer_decode"><tt>obj = buffer.decode(str)</tt></h3> | 429 | <h3 id="buffer_decode"><tt>obj = buffer.decode(str)<br> |
430 | obj = buf:decode()</tt></h3> | ||
164 | <p> | 431 | <p> |
165 | 432 | The stand-alone function de-serializes (decodes) the string | |
166 | De-serializes (decodes) the string <tt>str</tt> into the Lua object | 433 | <tt>str</tt>, the buffer method de-serializes one object from the |
167 | <tt>obj</tt>. | 434 | buffer. Both return a Lua object <tt>obj</tt>. |
168 | |||
169 | </p> | 435 | </p> |
170 | <p> | 436 | <p> |
171 | |||
172 | The returned object may be any of the supported Lua types — | 437 | The returned object may be any of the supported Lua types — |
173 | even <tt>nil</tt>. | 438 | even <tt>nil</tt>. |
174 | |||
175 | </p> | 439 | </p> |
176 | <p> | 440 | <p> |
177 | |||
178 | This function may throw an error when fed with malformed or incomplete | 441 | This function may throw an error when fed with malformed or incomplete |
179 | encoded data. The standalone function throws when there's left-over data | 442 | encoded data. The stand-alone function throws when there's left-over |
180 | after decoding a single top-level object. | 443 | data after decoding a single top-level object. The buffer method leaves |
181 | 444 | any left-over data in the buffer. | |
182 | </p> | 445 | </p> |
183 | 446 | ||
184 | <h2 id="serialize_format">Serialization Format Specification</h2> | 447 | <h3 id="serialize_stream">Streaming Serialization</h3> |
448 | <p> | ||
449 | In some contexts, it's desirable to do piecewise serialization of large | ||
450 | datasets, also known as <i>streaming</i>. | ||
451 | </p> | ||
452 | <p> | ||
453 | This serialization format can be safely concatenated and supports streaming. | ||
454 | Multiple encodings can simply be appended to a buffer and later decoded | ||
455 | individually: | ||
456 | </p> | ||
457 | <pre class="code"> | ||
458 | local buf = buffer.new() | ||
459 | buf:encode(obj1) | ||
460 | buf:encode(obj2) | ||
461 | local copy1 = buf:decode() | ||
462 | local copy2 = buf:decode() | ||
463 | </pre> | ||
185 | <p> | 464 | <p> |
465 | Here's how to iterate over a stream: | ||
466 | </p> | ||
467 | <pre class="code"> | ||
468 | while #buf ~= 0 do | ||
469 | local obj = buf:decode() | ||
470 | -- Do something with obj. | ||
471 | end | ||
472 | </pre> | ||
473 | <p> | ||
474 | Since the serialization format doesn't prepend a length to its encoding, | ||
475 | network applications may need to transmit the length, too. | ||
476 | </p> | ||
186 | 477 | ||
478 | <h3 id="serialize_format">Serialization Format Specification</h3> | ||
479 | <p> | ||
187 | This serialization format is designed for <b>internal use</b> by LuaJIT | 480 | This serialization format is designed for <b>internal use</b> by LuaJIT |
188 | applications. Serialized data is upwards-compatible and portable across | 481 | applications. Serialized data is upwards-compatible and portable across |
189 | all supported LuaJIT platforms. | 482 | all supported LuaJIT platforms. |
190 | |||
191 | </p> | 483 | </p> |
192 | <p> | 484 | <p> |
193 | |||
194 | It's an <b>8-bit binary format</b> and not human-readable. It uses e.g. | 485 | It's an <b>8-bit binary format</b> and not human-readable. It uses e.g. |
195 | embedded zeroes and stores embedded Lua string objects unmodified, which | 486 | embedded zeroes and stores embedded Lua string objects unmodified, which |
196 | are 8-bit-clean, too. Encoded data can be safely concatenated for | 487 | are 8-bit-clean, too. Encoded data can be safely concatenated for |
197 | streaming and later decoded one top-level object at a time. | 488 | streaming and later decoded one top-level object at a time. |
198 | |||
199 | </p> | 489 | </p> |
200 | <p> | 490 | <p> |
201 | |||
202 | The encoding is reasonably compact, but tuned for maximum performance, | 491 | The encoding is reasonably compact, but tuned for maximum performance, |
203 | not for minimum space usage. It compresses well with any of the common | 492 | not for minimum space usage. It compresses well with any of the common |
204 | byte-oriented data compression algorithms. | 493 | byte-oriented data compression algorithms. |
205 | |||
206 | </p> | 494 | </p> |
207 | <p> | 495 | <p> |
208 | |||
209 | Although documented here for reference, this format is explicitly | 496 | Although documented here for reference, this format is explicitly |
210 | <b>not</b> intended to be a 'public standard' for structured data | 497 | <b>not</b> intended to be a 'public standard' for structured data |
211 | interchange across computer languages (like JSON or MessagePack). Please | 498 | interchange across computer languages (like JSON or MessagePack). Please |
212 | do not use it as such. | 499 | do not use it as such. |
213 | |||
214 | </p> | 500 | </p> |
215 | <p> | 501 | <p> |
216 | |||
217 | The specification is given below as a context-free grammar with a | 502 | The specification is given below as a context-free grammar with a |
218 | top-level <tt>object</tt> as the starting point. Alternatives are | 503 | top-level <tt>object</tt> as the starting point. Alternatives are |
219 | separated by the <tt>|</tt> symbol and <tt>*</tt> indicates repeats. | 504 | separated by the <tt>|</tt> symbol and <tt>*</tt> indicates repeats. |
220 | Grouping is implicit or indicated by <tt>{…}</tt>. Terminals are | 505 | Grouping is implicit or indicated by <tt>{…}</tt>. Terminals are |
221 | either plain hex numbers, encoded as bytes, or have a <tt>.format</tt> | 506 | either plain hex numbers, encoded as bytes, or have a <tt>.format</tt> |
222 | suffix. | 507 | suffix. |
223 | |||
224 | </p> | 508 | </p> |
225 | <pre> | 509 | <pre> |
226 | object → nil | false | true | 510 | object → nil | false | true |
@@ -261,6 +545,73 @@ string → (0x20+len).U len*char.B | |||
261 | 0xe0..0x1fdf → (0xe0|(((n-0xe0)>>8)&0x1f)).B ((n-0xe0)&0xff).B | 545 | 0xe0..0x1fdf → (0xe0|(((n-0xe0)>>8)&0x1f)).B ((n-0xe0)&0xff).B |
262 | 0x1fe0.. → 0xff n.I | 546 | 0x1fe0.. → 0xff n.I |
263 | </pre> | 547 | </pre> |
548 | |||
549 | <h2 id="error">Error handling</h2> | ||
550 | <p> | ||
551 | Many of the buffer methods can throw an error. Out-of-memory or usage | ||
552 | errors are best caught with an outer wrapper for larger parts of code. | ||
553 | There's not much one can do after that, anyway. | ||
554 | </p> | ||
555 | <p> | ||
556 | OTOH you may want to catch some errors individually. Buffer methods need | ||
557 | to receive the buffer object as the first argument. The Lua colon-syntax | ||
558 | <tt>obj:method()</tt> does that implicitly. But to wrap a method with | ||
559 | <tt>pcall()</tt>, the arguments need to be passed like this: | ||
560 | </p> | ||
561 | <pre class="code"> | ||
562 | local ok, err = pcall(buf.encode, buf, obj) | ||
563 | if not ok then | ||
564 | -- Handle error in err. | ||
565 | end | ||
566 | </pre> | ||
567 | |||
568 | <h2 id="ffi_caveats">FFI caveats</h2> | ||
569 | <p> | ||
570 | The string buffer library has been designed to work well together with | ||
571 | the FFI library. But due to the low-level nature of the FFI library, | ||
572 | some care needs to be taken: | ||
573 | </p> | ||
574 | <p> | ||
575 | First, please remember that FFI pointers are zero-indexed. The space | ||
576 | returned by <tt>buf:reserve()</tt> and <tt>buf:ref()</tt> starts at the | ||
577 | returned pointer and ends before <tt>len</tt> bytes after that. | ||
578 | </p> | ||
579 | <p> | ||
580 | I.e. the first valid index is <tt>ptr[0]</tt> and the last valid index | ||
581 | is <tt>ptr[len-1]</tt>. If the returned length is zero, there's no valid | ||
582 | index at all. The returned pointer may even be <tt>NULL</tt>. | ||
583 | </p> | ||
584 | <p> | ||
585 | The space pointed to by the returned pointer is only valid as long as | ||
586 | the buffer is not modified in any way (neither append, nor consume, nor | ||
587 | reset, etc.). The pointer is also not a GC anchor for the buffer object | ||
588 | itself. | ||
589 | </p> | ||
590 | <p> | ||
591 | Buffer data is only guaranteed to be byte-aligned. Casting the returned | ||
592 | pointer to a data type with higher alignment may cause unaligned | ||
593 | accesses. It depends on the CPU architecture whether this is allowed or | ||
594 | not (it's always OK on x86/x64 and mostly OK on other modern | ||
595 | architectures). | ||
596 | </p> | ||
597 | <p> | ||
598 | FFI pointers or references do not count as GC anchors for an underlying | ||
599 | object. E.g. an <tt>array</tt> allocated with <tt>ffi.new()</tt> is | ||
600 | anchored by <tt>buf:set(array, len)</tt>, but not by | ||
601 | <tt>buf:set(array+offset, len)</tt>. The addition of the offset | ||
602 | creates a new pointer, even when the offset is zero. In this case, you | ||
603 | need to make sure there's still a reference to the original array as | ||
604 | long as its contents are in use by the buffer. | ||
605 | </p> | ||
606 | <p> | ||
607 | Even though each LuaJIT VM instance is single-threaded (but you can | ||
608 | create multiple VMs), FFI data structures can be accessed concurrently. | ||
609 | Be careful when reading/writing FFI cdata from/to buffers to avoid | ||
610 | concurrent accesses or modifications. In particular, the memory | ||
611 | referenced by <tt>buf:set(cdata, len)</tt> must not be modified | ||
612 | while buffer readers are working on it. Shared, but read-only memory | ||
613 | mappings of files are OK, but only if the file does not change. | ||
614 | </p> | ||
264 | <br class="flush"> | 615 | <br class="flush"> |
265 | </div> | 616 | </div> |
266 | <div id="foot"> | 617 | <div id="foot"> |