diff options
author | Igor Pavlov <87184205+ip7z@users.noreply.github.com> | 2021-12-27 00:00:00 +0000 |
---|---|---|
committer | Igor Pavlov <87184205+ip7z@users.noreply.github.com> | 2022-03-18 15:35:13 +0500 |
commit | f19f813537c7aea1c20749c914e756b54a9c3cf5 (patch) | |
tree | 816ba62ca7c0fa19f2eb46d9e9d6f7dd7c3a744d /DOC/lzma.txt | |
parent | 98e06a519b63b81986abe76d28887f6984a7732b (diff) | |
download | 7zip-21.07.tar.gz 7zip-21.07.tar.bz2 7zip-21.07.zip |
'21.07'21.07
Diffstat (limited to 'DOC/lzma.txt')
-rw-r--r-- | DOC/lzma.txt | 328 |
1 files changed, 328 insertions, 0 deletions
diff --git a/DOC/lzma.txt b/DOC/lzma.txt new file mode 100644 index 0000000..a65988f --- /dev/null +++ b/DOC/lzma.txt | |||
@@ -0,0 +1,328 @@ | |||
1 | LZMA compression | ||
2 | ---------------- | ||
3 | Version: 9.35 | ||
4 | |||
5 | This file describes LZMA encoding and decoding functions written in C language. | ||
6 | |||
7 | LZMA is an improved version of famous LZ77 compression algorithm. | ||
8 | It was improved in way of maximum increasing of compression ratio, | ||
9 | keeping high decompression speed and low memory requirements for | ||
10 | decompressing. | ||
11 | |||
12 | Note: you can read also LZMA Specification (lzma-specification.txt from LZMA SDK) | ||
13 | |||
14 | Also you can look source code for LZMA encoding and decoding: | ||
15 | C/Util/Lzma/LzmaUtil.c | ||
16 | |||
17 | |||
18 | LZMA compressed file format | ||
19 | --------------------------- | ||
20 | Offset Size Description | ||
21 | 0 1 Special LZMA properties (lc,lp, pb in encoded form) | ||
22 | 1 4 Dictionary size (little endian) | ||
23 | 5 8 Uncompressed size (little endian). -1 means unknown size | ||
24 | 13 Compressed data | ||
25 | |||
26 | |||
27 | |||
28 | ANSI-C LZMA Decoder | ||
29 | ~~~~~~~~~~~~~~~~~~~ | ||
30 | |||
31 | Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58. | ||
32 | If you want to use old interfaces you can download previous version of LZMA SDK | ||
33 | from sourceforge.net site. | ||
34 | |||
35 | To use ANSI-C LZMA Decoder you need the following files: | ||
36 | 1) LzmaDec.h + LzmaDec.c + 7zTypes.h + Precomp.h + Compiler.h | ||
37 | |||
38 | Look example code: | ||
39 | C/Util/Lzma/LzmaUtil.c | ||
40 | |||
41 | |||
42 | Memory requirements for LZMA decoding | ||
43 | ------------------------------------- | ||
44 | |||
45 | Stack usage of LZMA decoding function for local variables is not | ||
46 | larger than 200-400 bytes. | ||
47 | |||
48 | LZMA Decoder uses dictionary buffer and internal state structure. | ||
49 | Internal state structure consumes | ||
50 | state_size = (4 + (1.5 << (lc + lp))) KB | ||
51 | by default (lc=3, lp=0), state_size = 16 KB. | ||
52 | |||
53 | |||
54 | How To decompress data | ||
55 | ---------------------- | ||
56 | |||
57 | LZMA Decoder (ANSI-C version) now supports 2 interfaces: | ||
58 | 1) Single-call Decompressing | ||
59 | 2) Multi-call State Decompressing (zlib-like interface) | ||
60 | |||
61 | You must use external allocator: | ||
62 | Example: | ||
63 | void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); } | ||
64 | void SzFree(void *p, void *address) { p = p; free(address); } | ||
65 | ISzAlloc alloc = { SzAlloc, SzFree }; | ||
66 | |||
67 | You can use p = p; operator to disable compiler warnings. | ||
68 | |||
69 | |||
70 | Single-call Decompressing | ||
71 | ------------------------- | ||
72 | When to use: RAM->RAM decompressing | ||
73 | Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h | ||
74 | Compile defines: no defines | ||
75 | Memory Requirements: | ||
76 | - Input buffer: compressed size | ||
77 | - Output buffer: uncompressed size | ||
78 | - LZMA Internal Structures: state_size (16 KB for default settings) | ||
79 | |||
80 | Interface: | ||
81 | int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen, | ||
82 | const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode, | ||
83 | ELzmaStatus *status, ISzAlloc *alloc); | ||
84 | In: | ||
85 | dest - output data | ||
86 | destLen - output data size | ||
87 | src - input data | ||
88 | srcLen - input data size | ||
89 | propData - LZMA properties (5 bytes) | ||
90 | propSize - size of propData buffer (5 bytes) | ||
91 | finishMode - It has meaning only if the decoding reaches output limit (*destLen). | ||
92 | LZMA_FINISH_ANY - Decode just destLen bytes. | ||
93 | LZMA_FINISH_END - Stream must be finished after (*destLen). | ||
94 | You can use LZMA_FINISH_END, when you know that | ||
95 | current output buffer covers last bytes of stream. | ||
96 | alloc - Memory allocator. | ||
97 | |||
98 | Out: | ||
99 | destLen - processed output size | ||
100 | srcLen - processed input size | ||
101 | |||
102 | Output: | ||
103 | SZ_OK | ||
104 | status: | ||
105 | LZMA_STATUS_FINISHED_WITH_MARK | ||
106 | LZMA_STATUS_NOT_FINISHED | ||
107 | LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK | ||
108 | SZ_ERROR_DATA - Data error | ||
109 | SZ_ERROR_MEM - Memory allocation error | ||
110 | SZ_ERROR_UNSUPPORTED - Unsupported properties | ||
111 | SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src). | ||
112 | |||
113 | If LZMA decoder sees end_marker before reaching output limit, it returns OK result, | ||
114 | and output value of destLen will be less than output buffer size limit. | ||
115 | |||
116 | You can use multiple checks to test data integrity after full decompression: | ||
117 | 1) Check Result and "status" variable. | ||
118 | 2) Check that output(destLen) = uncompressedSize, if you know real uncompressedSize. | ||
119 | 3) Check that output(srcLen) = compressedSize, if you know real compressedSize. | ||
120 | You must use correct finish mode in that case. */ | ||
121 | |||
122 | |||
123 | Multi-call State Decompressing (zlib-like interface) | ||
124 | ---------------------------------------------------- | ||
125 | |||
126 | When to use: file->file decompressing | ||
127 | Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h | ||
128 | |||
129 | Memory Requirements: | ||
130 | - Buffer for input stream: any size (for example, 16 KB) | ||
131 | - Buffer for output stream: any size (for example, 16 KB) | ||
132 | - LZMA Internal Structures: state_size (16 KB for default settings) | ||
133 | - LZMA dictionary (dictionary size is encoded in LZMA properties header) | ||
134 | |||
135 | 1) read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header: | ||
136 | unsigned char header[LZMA_PROPS_SIZE + 8]; | ||
137 | ReadFile(inFile, header, sizeof(header) | ||
138 | |||
139 | 2) Allocate CLzmaDec structures (state + dictionary) using LZMA properties | ||
140 | |||
141 | CLzmaDec state; | ||
142 | LzmaDec_Constr(&state); | ||
143 | res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc); | ||
144 | if (res != SZ_OK) | ||
145 | return res; | ||
146 | |||
147 | 3) Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop | ||
148 | |||
149 | LzmaDec_Init(&state); | ||
150 | for (;;) | ||
151 | { | ||
152 | ... | ||
153 | int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen, | ||
154 | const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode); | ||
155 | ... | ||
156 | } | ||
157 | |||
158 | |||
159 | 4) Free all allocated structures | ||
160 | LzmaDec_Free(&state, &g_Alloc); | ||
161 | |||
162 | Look example code: | ||
163 | C/Util/Lzma/LzmaUtil.c | ||
164 | |||
165 | |||
166 | How To compress data | ||
167 | -------------------- | ||
168 | |||
169 | Compile files: | ||
170 | 7zTypes.h | ||
171 | Threads.h | ||
172 | LzmaEnc.h | ||
173 | LzmaEnc.c | ||
174 | LzFind.h | ||
175 | LzFind.c | ||
176 | LzFindMt.h | ||
177 | LzFindMt.c | ||
178 | LzHash.h | ||
179 | |||
180 | Memory Requirements: | ||
181 | - (dictSize * 11.5 + 6 MB) + state_size | ||
182 | |||
183 | Lzma Encoder can use two memory allocators: | ||
184 | 1) alloc - for small arrays. | ||
185 | 2) allocBig - for big arrays. | ||
186 | |||
187 | For example, you can use Large RAM Pages (2 MB) in allocBig allocator for | ||
188 | better compression speed. Note that Windows has bad implementation for | ||
189 | Large RAM Pages. | ||
190 | It's OK to use same allocator for alloc and allocBig. | ||
191 | |||
192 | |||
193 | Single-call Compression with callbacks | ||
194 | -------------------------------------- | ||
195 | |||
196 | Look example code: | ||
197 | C/Util/Lzma/LzmaUtil.c | ||
198 | |||
199 | When to use: file->file compressing | ||
200 | |||
201 | 1) you must implement callback structures for interfaces: | ||
202 | ISeqInStream | ||
203 | ISeqOutStream | ||
204 | ICompressProgress | ||
205 | ISzAlloc | ||
206 | |||
207 | static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); } | ||
208 | static void SzFree(void *p, void *address) { p = p; MyFree(address); } | ||
209 | static ISzAlloc g_Alloc = { SzAlloc, SzFree }; | ||
210 | |||
211 | CFileSeqInStream inStream; | ||
212 | CFileSeqOutStream outStream; | ||
213 | |||
214 | inStream.funcTable.Read = MyRead; | ||
215 | inStream.file = inFile; | ||
216 | outStream.funcTable.Write = MyWrite; | ||
217 | outStream.file = outFile; | ||
218 | |||
219 | |||
220 | 2) Create CLzmaEncHandle object; | ||
221 | |||
222 | CLzmaEncHandle enc; | ||
223 | |||
224 | enc = LzmaEnc_Create(&g_Alloc); | ||
225 | if (enc == 0) | ||
226 | return SZ_ERROR_MEM; | ||
227 | |||
228 | |||
229 | 3) initialize CLzmaEncProps properties; | ||
230 | |||
231 | LzmaEncProps_Init(&props); | ||
232 | |||
233 | Then you can change some properties in that structure. | ||
234 | |||
235 | 4) Send LZMA properties to LZMA Encoder | ||
236 | |||
237 | res = LzmaEnc_SetProps(enc, &props); | ||
238 | |||
239 | 5) Write encoded properties to header | ||
240 | |||
241 | Byte header[LZMA_PROPS_SIZE + 8]; | ||
242 | size_t headerSize = LZMA_PROPS_SIZE; | ||
243 | UInt64 fileSize; | ||
244 | int i; | ||
245 | |||
246 | res = LzmaEnc_WriteProperties(enc, header, &headerSize); | ||
247 | fileSize = MyGetFileLength(inFile); | ||
248 | for (i = 0; i < 8; i++) | ||
249 | header[headerSize++] = (Byte)(fileSize >> (8 * i)); | ||
250 | MyWriteFileAndCheck(outFile, header, headerSize) | ||
251 | |||
252 | 6) Call encoding function: | ||
253 | res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable, | ||
254 | NULL, &g_Alloc, &g_Alloc); | ||
255 | |||
256 | 7) Destroy LZMA Encoder Object | ||
257 | LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc); | ||
258 | |||
259 | |||
260 | If callback function return some error code, LzmaEnc_Encode also returns that code | ||
261 | or it can return the code like SZ_ERROR_READ, SZ_ERROR_WRITE or SZ_ERROR_PROGRESS. | ||
262 | |||
263 | |||
264 | Single-call RAM->RAM Compression | ||
265 | -------------------------------- | ||
266 | |||
267 | Single-call RAM->RAM Compression is similar to Compression with callbacks, | ||
268 | but you provide pointers to buffers instead of pointers to stream callbacks: | ||
269 | |||
270 | SRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen, | ||
271 | const CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark, | ||
272 | ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig); | ||
273 | |||
274 | Return code: | ||
275 | SZ_OK - OK | ||
276 | SZ_ERROR_MEM - Memory allocation error | ||
277 | SZ_ERROR_PARAM - Incorrect paramater | ||
278 | SZ_ERROR_OUTPUT_EOF - output buffer overflow | ||
279 | SZ_ERROR_THREAD - errors in multithreading functions (only for Mt version) | ||
280 | |||
281 | |||
282 | |||
283 | Defines | ||
284 | ------- | ||
285 | |||
286 | _LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code. | ||
287 | |||
288 | _LZMA_PROB32 - It can increase the speed on some 32-bit CPUs, but memory usage for | ||
289 | some structures will be doubled in that case. | ||
290 | |||
291 | _LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler and long is 32-bit. | ||
292 | |||
293 | _LZMA_NO_SYSTEM_SIZE_T - Define it if you don't want to use size_t type. | ||
294 | |||
295 | |||
296 | _7ZIP_PPMD_SUPPPORT - Define it if you don't want to support PPMD method in AMSI-C .7z decoder. | ||
297 | |||
298 | |||
299 | C++ LZMA Encoder/Decoder | ||
300 | ~~~~~~~~~~~~~~~~~~~~~~~~ | ||
301 | C++ LZMA code use COM-like interfaces. So if you want to use it, | ||
302 | you can study basics of COM/OLE. | ||
303 | C++ LZMA code is just wrapper over ANSI-C code. | ||
304 | |||
305 | |||
306 | C++ Notes | ||
307 | ~~~~~~~~~~~~~~~~~~~~~~~~ | ||
308 | If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling), | ||
309 | you must check that you correctly work with "new" operator. | ||
310 | 7-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator. | ||
311 | So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator: | ||
312 | operator new(size_t size) | ||
313 | { | ||
314 | void *p = ::malloc(size); | ||
315 | if (p == 0) | ||
316 | throw CNewException(); | ||
317 | return p; | ||
318 | } | ||
319 | If you use MSCV that throws exception for "new" operator, you can compile without | ||
320 | "NewHandler.cpp". So standard exception will be used. Actually some code of | ||
321 | 7-Zip catches any exception in internal code and converts it to HRESULT code. | ||
322 | So you don't need to catch CNewException, if you call COM interfaces of 7-Zip. | ||
323 | |||
324 | --- | ||
325 | |||
326 | http://www.7-zip.org | ||
327 | http://www.7-zip.org/sdk.html | ||
328 | http://www.7-zip.org/support.html | ||