From 4c6b669c419f313306b9e6ee43be4ad5f6d73ec6 Mon Sep 17 00:00:00 2001 From: Mike Pall Date: Thu, 25 Mar 2021 02:21:31 +0100 Subject: String buffers, part 1: object serialization. Sponsored by fmad.io. --- doc/ext_buffer.html | 275 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 275 insertions(+) create mode 100644 doc/ext_buffer.html (limited to 'doc/ext_buffer.html') diff --git a/doc/ext_buffer.html b/doc/ext_buffer.html new file mode 100644 index 00000000..455c298d --- /dev/null +++ b/doc/ext_buffer.html @@ -0,0 +1,275 @@ + + + +String Buffers + + + + + + + +
+Lua +
+ + +
+

+ +The string buffer library allows high-performance manipulation of +string-like data. + +

+

+ +Unlike Lua strings, which are constants, string buffers are +mutable sequences of 8-bit (binary-transparent) characters. Data +can be stored, formatted and encoded into a string buffer and later +converted, decoded or extracted. + +

+

+ +The convenient string buffer API simplifies common string manipulation +tasks, that would otherwise require creating many intermediate strings. +String buffers improve performance by eliminating redundant memory +copies, object creation, string interning and garbage collection +overhead. In conjunction with the FFI library, they allow zero-copy +operations. + +

+ +

Using the String Buffer Library

+

+The string buffer library is built into LuaJIT by default, but it's not +loaded by default. Add this to the start of every Lua file that needs +one of its functions: +

+
+local buffer = require("string.buffer")
+
+ +

Work in Progress

+ +

+ +This library is a work in progress. More +functions will be added soon. + +

+ +

Serialization of Lua Objects

+

+ +The following functions and methods allow high-speed serialization +(encoding) of a Lua object into a string and decoding it back to a Lua +object. This allows convenient storage and transport of structured +data. + +

+

+ +The encoded data is in an internal binary +format. The data can be stored in files, binary-transparent +databases or transmitted to other LuaJIT instances across threads, +processes or networks. + +

+

+ +Encoding speed can reach up to 1 Gigabyte/second on a modern desktop- or +server-class system, even when serializing many small objects. Decoding +speed is mostly constrained by object creation cost. + +

+

+ +The serializer handles most Lua types, common FFI number types and +nested structures. Functions, thread objects, other FFI cdata, full +userdata and associated metatables cannot be serialized (yet). + +

+

+ +The encoder serializes nested structures as trees. Multiple references +to a single object will be stored separately and create distinct objects +after decoding. Circular references cause an error. + + +

+ +

str = buffer.encode(obj)

+

+ +Serializes (encodes) the Lua object obj into the string +str. + +

+

+ +obj can be any of the supported Lua types — it doesn't +need to be a Lua table. + +

+

+ +This function may throw an error when attempting to serialize +unsupported object types, circular references or deeply nested tables. + +

+ +

obj = buffer.decode(str)

+

+ +De-serializes (decodes) the string str into the Lua object +obj. + +

+

+ +The returned object may be any of the supported Lua types — +even nil. + +

+

+ +This function may throw an error when fed with malformed or incomplete +encoded data. The standalone function throws when there's left-over data +after decoding a single top-level object. + +

+ +

Serialization Format Specification

+

+ +This serialization format is designed for internal use by LuaJIT +applications. Serialized data is upwards-compatible and portable across +all supported LuaJIT platforms. + +

+

+ +It's an 8-bit binary format and not human-readable. It uses e.g. +embedded zeroes and stores embedded Lua string objects unmodified, which +are 8-bit-clean, too. Encoded data can be safely concatenated for +streaming and later decoded one top-level object at a time. + +

+

+ +The encoding is reasonably compact, but tuned for maximum performance, +not for minimum space usage. It compresses well with any of the common +byte-oriented data compression algorithms. + +

+

+ +Although documented here for reference, this format is explicitly +not intended to be a 'public standard' for structured data +interchange across computer languages (like JSON or MessagePack). Please +do not use it as such. + +

+

+ +The specification is given below as a context-free grammar with a +top-level object as the starting point. Alternatives are +separated by the | symbol and * indicates repeats. +Grouping is implicit or indicated by {…}. Terminals are +either plain hex numbers, encoded as bytes, or have a .format +suffix. + +

+
+object    → nil | false | true
+          | null | lightud32 | lightud64
+          | int | num | tab
+          | int64 | uint64 | complex
+          | string
+
+nil       → 0x00
+false     → 0x01
+true      → 0x02
+
+null      → 0x03                            // NULL lightuserdata
+lightud32 → 0x04 data.I                   // 32 bit lightuserdata
+lightud64 → 0x05 data.L                   // 64 bit lightuserdata
+
+int       → 0x06 int.I                                 // int32_t
+num       → 0x07 double.L
+
+tab       → 0x08                                   // Empty table
+          | 0x09 h.U h*{object object}          // Key/value hash
+          | 0x0a a.U a*object                    // 0-based array
+          | 0x0b a.U a*object h.U h*{object object}      // Mixed
+          | 0x0c a.U (a-1)*object                // 1-based array
+          | 0x0d a.U (a-1)*object h.U h*{object object}  // Mixed
+
+int64     → 0x10 int.L                             // FFI int64_t
+uint64    → 0x11 uint.L                           // FFI uint64_t
+complex   → 0x12 re.L im.L                         // FFI complex
+
+string    → (0x20+len).U len*char.B
+
+.B = 8 bit
+.I = 32 bit little-endian
+.L = 64 bit little-endian
+.U = prefix-encoded 32 bit unsigned number n:
+     0x00..0xdf   → n.B
+     0xe0..0x1fdf → (0xe0|(((n-0xe0)>>8)&0x1f)).B ((n-0xe0)&0xff).B
+   0x1fe0..       → 0xff n.I
+
+
+
+ + + -- cgit v1.2.3-55-g6feb