diff options
| author | Mark Adler <madler@alumni.caltech.edu> | 2021-12-31 16:57:07 -0800 |
|---|---|---|
| committer | Mark Adler <madler@alumni.caltech.edu> | 2021-12-31 16:57:07 -0800 |
| commit | 8678871f18f4dd51101a9db1e37791f975969079 (patch) | |
| tree | 4db677c163317d56fefa7f52aaa440271fe4c7eb /doc | |
| parent | c3f3043f7aa80750245f8166a338c4877020b589 (diff) | |
| download | zlib-8678871f18f4dd51101a9db1e37791f975969079.tar.gz zlib-8678871f18f4dd51101a9db1e37791f975969079.tar.bz2 zlib-8678871f18f4dd51101a9db1e37791f975969079.zip | |
Replace black/white with allow/block. (theresa-m)
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/txtvsbin.txt | 12 |
1 files changed, 6 insertions, 6 deletions
diff --git a/doc/txtvsbin.txt b/doc/txtvsbin.txt index 3d0f063..2a901ea 100644 --- a/doc/txtvsbin.txt +++ b/doc/txtvsbin.txt | |||
| @@ -38,15 +38,15 @@ The Algorithm | |||
| 38 | 38 | ||
| 39 | The algorithm works by dividing the set of bytecodes [0..255] into three | 39 | The algorithm works by dividing the set of bytecodes [0..255] into three |
| 40 | categories: | 40 | categories: |
| 41 | - The white list of textual bytecodes: | 41 | - The allow list of textual bytecodes: |
| 42 | 9 (TAB), 10 (LF), 13 (CR), 32 (SPACE) to 255. | 42 | 9 (TAB), 10 (LF), 13 (CR), 32 (SPACE) to 255. |
| 43 | - The gray list of tolerated bytecodes: | 43 | - The gray list of tolerated bytecodes: |
| 44 | 7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB), 27 (ESC). | 44 | 7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB), 27 (ESC). |
| 45 | - The black list of undesired, non-textual bytecodes: | 45 | - The block list of undesired, non-textual bytecodes: |
| 46 | 0 (NUL) to 6, 14 to 31. | 46 | 0 (NUL) to 6, 14 to 31. |
| 47 | 47 | ||
| 48 | If a file contains at least one byte that belongs to the white list and | 48 | If a file contains at least one byte that belongs to the allow list and |
| 49 | no byte that belongs to the black list, then the file is categorized as | 49 | no byte that belongs to the block list, then the file is categorized as |
| 50 | plain text; otherwise, it is categorized as binary. (The boundary case, | 50 | plain text; otherwise, it is categorized as binary. (The boundary case, |
| 51 | when the file is empty, automatically falls into the latter category.) | 51 | when the file is empty, automatically falls into the latter category.) |
| 52 | 52 | ||
| @@ -84,9 +84,9 @@ consistent results, regardless what alphabet encoding is being used. | |||
| 84 | results on a text encoded, say, using ISO-8859-16 versus UTF-8.) | 84 | results on a text encoded, say, using ISO-8859-16 versus UTF-8.) |
| 85 | 85 | ||
| 86 | There is an extra category of plain text files that are "polluted" with | 86 | There is an extra category of plain text files that are "polluted" with |
| 87 | one or more black-listed codes, either by mistake or by peculiar design | 87 | one or more block-listed codes, either by mistake or by peculiar design |
| 88 | considerations. In such cases, a scheme that tolerates a small fraction | 88 | considerations. In such cases, a scheme that tolerates a small fraction |
| 89 | of black-listed codes would provide an increased recall (i.e. more true | 89 | of block-listed codes would provide an increased recall (i.e. more true |
| 90 | positives). This, however, incurs a reduced precision overall, since | 90 | positives). This, however, incurs a reduced precision overall, since |
| 91 | false positives are more likely to appear in binary files that contain | 91 | false positives are more likely to appear in binary files that contain |
| 92 | large chunks of textual data. Furthermore, "polluted" plain text should | 92 | large chunks of textual data. Furthermore, "polluted" plain text should |
