diff options
author | Mark Adler <madler@alumni.caltech.edu> | 2021-12-31 16:57:07 -0800 |
---|---|---|
committer | Mark Adler <madler@alumni.caltech.edu> | 2021-12-31 16:57:07 -0800 |
commit | 8678871f18f4dd51101a9db1e37791f975969079 (patch) | |
tree | 4db677c163317d56fefa7f52aaa440271fe4c7eb /doc | |
parent | c3f3043f7aa80750245f8166a338c4877020b589 (diff) | |
download | zlib-8678871f18f4dd51101a9db1e37791f975969079.tar.gz zlib-8678871f18f4dd51101a9db1e37791f975969079.tar.bz2 zlib-8678871f18f4dd51101a9db1e37791f975969079.zip |
Replace black/white with allow/block. (theresa-m)
Diffstat (limited to 'doc')
-rw-r--r-- | doc/txtvsbin.txt | 12 |
1 files changed, 6 insertions, 6 deletions
diff --git a/doc/txtvsbin.txt b/doc/txtvsbin.txt index 3d0f063..2a901ea 100644 --- a/doc/txtvsbin.txt +++ b/doc/txtvsbin.txt | |||
@@ -38,15 +38,15 @@ The Algorithm | |||
38 | 38 | ||
39 | The algorithm works by dividing the set of bytecodes [0..255] into three | 39 | The algorithm works by dividing the set of bytecodes [0..255] into three |
40 | categories: | 40 | categories: |
41 | - The white list of textual bytecodes: | 41 | - The allow list of textual bytecodes: |
42 | 9 (TAB), 10 (LF), 13 (CR), 32 (SPACE) to 255. | 42 | 9 (TAB), 10 (LF), 13 (CR), 32 (SPACE) to 255. |
43 | - The gray list of tolerated bytecodes: | 43 | - The gray list of tolerated bytecodes: |
44 | 7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB), 27 (ESC). | 44 | 7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB), 27 (ESC). |
45 | - The black list of undesired, non-textual bytecodes: | 45 | - The block list of undesired, non-textual bytecodes: |
46 | 0 (NUL) to 6, 14 to 31. | 46 | 0 (NUL) to 6, 14 to 31. |
47 | 47 | ||
48 | If a file contains at least one byte that belongs to the white list and | 48 | If a file contains at least one byte that belongs to the allow list and |
49 | no byte that belongs to the black list, then the file is categorized as | 49 | no byte that belongs to the block list, then the file is categorized as |
50 | plain text; otherwise, it is categorized as binary. (The boundary case, | 50 | plain text; otherwise, it is categorized as binary. (The boundary case, |
51 | when the file is empty, automatically falls into the latter category.) | 51 | when the file is empty, automatically falls into the latter category.) |
52 | 52 | ||
@@ -84,9 +84,9 @@ consistent results, regardless what alphabet encoding is being used. | |||
84 | results on a text encoded, say, using ISO-8859-16 versus UTF-8.) | 84 | results on a text encoded, say, using ISO-8859-16 versus UTF-8.) |
85 | 85 | ||
86 | There is an extra category of plain text files that are "polluted" with | 86 | There is an extra category of plain text files that are "polluted" with |
87 | one or more black-listed codes, either by mistake or by peculiar design | 87 | one or more block-listed codes, either by mistake or by peculiar design |
88 | considerations. In such cases, a scheme that tolerates a small fraction | 88 | considerations. In such cases, a scheme that tolerates a small fraction |
89 | of black-listed codes would provide an increased recall (i.e. more true | 89 | of block-listed codes would provide an increased recall (i.e. more true |
90 | positives). This, however, incurs a reduced precision overall, since | 90 | positives). This, however, incurs a reduced precision overall, since |
91 | false positives are more likely to appear in binary files that contain | 91 | false positives are more likely to appear in binary files that contain |
92 | large chunks of textual data. Furthermore, "polluted" plain text should | 92 | large chunks of textual data. Furthermore, "polluted" plain text should |