aboutsummaryrefslogtreecommitdiff
path: root/doc/ext_buffer.html
blob: 455c298d624367381dd18daa4ec0c39a861f29c5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
<!DOCTYPE html>
<html>
<head>
<title>String Buffers</title>
<meta charset="utf-8">
<meta name="Copyright" content="Copyright (C) 2005-2021">
<meta name="Language" content="en">
<link rel="stylesheet" type="text/css" href="bluequad.css" media="screen">
<link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print">
</head>
<body>
<div id="site">
<a href="https://luajit.org"><span>Lua<span id="logo">JIT</span></span></a>
</div>
<div id="head">
<h1>String Buffers</h1>
</div>
<div id="nav">
<ul><li>
<a href="luajit.html">LuaJIT</a>
<ul><li>
<a href="https://luajit.org/download.html">Download <span class="ext">&raquo;</span></a>
</li><li>
<a href="install.html">Installation</a>
</li><li>
<a href="running.html">Running</a>
</li></ul>
</li><li>
<a href="extensions.html">Extensions</a>
<ul><li>
<a href="ext_ffi.html">FFI Library</a>
<ul><li>
<a href="ext_ffi_tutorial.html">FFI Tutorial</a>
</li><li>
<a href="ext_ffi_api.html">ffi.* API</a>
</li><li>
<a href="ext_ffi_semantics.html">FFI Semantics</a>
</li></ul>
</li><li>
<a class="current" href="ext_buffer.html">String Buffers</a>
</li><li>
<a href="ext_jit.html">jit.* Library</a>
</li><li>
<a href="ext_c_api.html">Lua/C API</a>
</li><li>
<a href="ext_profiler.html">Profiler</a>
</li></ul>
</li><li>
<a href="status.html">Status</a>
</li><li>
<a href="faq.html">FAQ</a>
</li><li>
<a href="http://wiki.luajit.org/">Wiki <span class="ext">&raquo;</span></a>
</li><li>
<a href="https://luajit.org/list.html">Mailing List <span class="ext">&raquo;</span></a>
</li></ul>
</div>
<div id="main">
<p>

The string buffer library allows <b>high-performance manipulation of
string-like data</b>.

</p>
<p>

Unlike Lua strings, which are constants, string buffers are
<b>mutable</b> sequences of 8-bit (binary-transparent) characters. Data
can be stored, formatted and encoded into a string buffer and later
converted, decoded or extracted.

</p>
<p>

The convenient string buffer API simplifies common string manipulation
tasks, that would otherwise require creating many intermediate strings.
String buffers improve performance by eliminating redundant memory
copies, object creation, string interning and garbage collection
overhead. In conjunction with the FFI library, they allow zero-copy
operations.

</p>

<h2 id="load">Using the String Buffer Library</h2>
<p>
The string buffer library is built into LuaJIT by default, but it's not
loaded by default. Add this to the start of every Lua file that needs
one of its functions:
</p>
<pre class="code">
local buffer = require("string.buffer")
</pre>

<h2 id="wip" style="color:#ff0000">Work in Progress</h2>

<p>

<b style="color:#ff0000">This library is a work in progress. More
functions will be added soon.</b>

</p>

<h2 id="serialize">Serialization of Lua Objects</h2>
<p>

The following functions and methods allow <b>high-speed serialization</b>
(encoding) of a Lua object into a string and decoding it back to a Lua
object. This allows convenient storage and transport of <b>structured
data</b>.

</p>
<p>

The encoded data is in an <a href="#serialize_format">internal binary
format</a>. The data can be stored in files, binary-transparent
databases or transmitted to other LuaJIT instances across threads,
processes or networks.

</p>
<p>

Encoding speed can reach up to 1 Gigabyte/second on a modern desktop- or
server-class system, even when serializing many small objects. Decoding
speed is mostly constrained by object creation cost.

</p>
<p>

The serializer handles most Lua types, common FFI number types and
nested structures. Functions, thread objects, other FFI cdata, full
userdata and associated metatables cannot be serialized (yet).

</p>
<p>

The encoder serializes nested structures as trees. Multiple references
to a single object will be stored separately and create distinct objects
after decoding. Circular references cause an error.


</p>

<h3 id="buffer_encode"><tt>str = buffer.encode(obj)</tt></h3>
<p>

Serializes (encodes) the Lua object <tt>obj</tt> into the string
<tt>str</tt>.

</p>
<p>

<tt>obj</tt> can be any of the supported Lua types &mdash; it doesn't
need to be a Lua table.

</p>
<p>

This function may throw an error when attempting to serialize
unsupported object types, circular references or deeply nested tables.

</p>

<h3 id="buffer_decode"><tt>obj = buffer.decode(str)</tt></h3>
<p>

De-serializes (decodes) the string <tt>str</tt> into the Lua object
<tt>obj</tt>.

</p>
<p>

The returned object may be any of the supported Lua types &mdash;
even <tt>nil</tt>.

</p>
<p>

This function may throw an error when fed with malformed or incomplete
encoded data. The standalone function throws when there's left-over data
after decoding a single top-level object.

</p>

<h2 id="serialize_format">Serialization Format Specification</h2>
<p>

This serialization format is designed for <b>internal use</b> by LuaJIT
applications. Serialized data is upwards-compatible and portable across
all supported LuaJIT platforms.

</p>
<p>

It's an <b>8-bit binary format</b> and not human-readable. It uses e.g.
embedded zeroes and stores embedded Lua string objects unmodified, which
are 8-bit-clean, too. Encoded data can be safely concatenated for
streaming and later decoded one top-level object at a time.

</p>
<p>

The encoding is reasonably compact, but tuned for maximum performance,
not for minimum space usage. It compresses well with any of the common
byte-oriented data compression algorithms.

</p>
<p>

Although documented here for reference, this format is explicitly
<b>not</b> intended to be a 'public standard' for structured data
interchange across computer languages (like JSON or MessagePack). Please
do not use it as such.

</p>
<p>

The specification is given below as a context-free grammar with a
top-level <tt>object</tt> as the starting point. Alternatives are
separated by the <tt>|</tt> symbol and <tt>*</tt> indicates repeats.
Grouping is implicit or indicated by <tt>{…}</tt>. Terminals are
either plain hex numbers, encoded as bytes, or have a <tt>.format</tt>
suffix.

</p>
<pre>
object    → nil | false | true
          | null | lightud32 | lightud64
          | int | num | tab
          | int64 | uint64 | complex
          | string

nil       → 0x00
false     → 0x01
true      → 0x02

null      → 0x03                            // NULL lightuserdata
lightud32 → 0x04 data.I                   // 32 bit lightuserdata
lightud64 → 0x05 data.L                   // 64 bit lightuserdata

int       → 0x06 int.I                                 // int32_t
num       → 0x07 double.L

tab       → 0x08                                   // Empty table
          | 0x09 h.U h*{object object}          // Key/value hash
          | 0x0a a.U a*object                    // 0-based array
          | 0x0b a.U a*object h.U h*{object object}      // Mixed
          | 0x0c a.U (a-1)*object                // 1-based array
          | 0x0d a.U (a-1)*object h.U h*{object object}  // Mixed

int64     → 0x10 int.L                             // FFI int64_t
uint64    → 0x11 uint.L                           // FFI uint64_t
complex   → 0x12 re.L im.L                         // FFI complex

string    → (0x20+len).U len*char.B

.B = 8 bit
.I = 32 bit little-endian
.L = 64 bit little-endian
.U = prefix-encoded 32 bit unsigned number n:
     0x00..0xdf   → n.B
     0xe0..0x1fdf → (0xe0|(((n-0xe0)>>8)&0x1f)).B ((n-0xe0)&0xff).B
   0x1fe0..       → 0xff n.I
</pre>
<br class="flush">
</div>
<div id="foot">
<hr class="hide">
Copyright &copy; 2005-2021
<span class="noprint">
&middot;
<a href="contact.html">Contact</a>
</span>
</div>
</body>
</html>