aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorMark Adler <madler@alumni.caltech.edu>2021-12-31 16:57:07 -0800
committerMark Adler <madler@alumni.caltech.edu>2021-12-31 16:57:07 -0800
commit8678871f18f4dd51101a9db1e37791f975969079 (patch)
tree4db677c163317d56fefa7f52aaa440271fe4c7eb /doc
parentc3f3043f7aa80750245f8166a338c4877020b589 (diff)
downloadzlib-8678871f18f4dd51101a9db1e37791f975969079.tar.gz
zlib-8678871f18f4dd51101a9db1e37791f975969079.tar.bz2
zlib-8678871f18f4dd51101a9db1e37791f975969079.zip
Replace black/white with allow/block. (theresa-m)
Diffstat (limited to 'doc')
-rw-r--r--doc/txtvsbin.txt12
1 files changed, 6 insertions, 6 deletions
diff --git a/doc/txtvsbin.txt b/doc/txtvsbin.txt
index 3d0f063..2a901ea 100644
--- a/doc/txtvsbin.txt
+++ b/doc/txtvsbin.txt
@@ -38,15 +38,15 @@ The Algorithm
38 38
39The algorithm works by dividing the set of bytecodes [0..255] into three 39The algorithm works by dividing the set of bytecodes [0..255] into three
40categories: 40categories:
41- The white list of textual bytecodes: 41- The allow list of textual bytecodes:
42 9 (TAB), 10 (LF), 13 (CR), 32 (SPACE) to 255. 42 9 (TAB), 10 (LF), 13 (CR), 32 (SPACE) to 255.
43- The gray list of tolerated bytecodes: 43- The gray list of tolerated bytecodes:
44 7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB), 27 (ESC). 44 7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB), 27 (ESC).
45- The black list of undesired, non-textual bytecodes: 45- The block list of undesired, non-textual bytecodes:
46 0 (NUL) to 6, 14 to 31. 46 0 (NUL) to 6, 14 to 31.
47 47
48If a file contains at least one byte that belongs to the white list and 48If a file contains at least one byte that belongs to the allow list and
49no byte that belongs to the black list, then the file is categorized as 49no byte that belongs to the block list, then the file is categorized as
50plain text; otherwise, it is categorized as binary. (The boundary case, 50plain text; otherwise, it is categorized as binary. (The boundary case,
51when the file is empty, automatically falls into the latter category.) 51when the file is empty, automatically falls into the latter category.)
52 52
@@ -84,9 +84,9 @@ consistent results, regardless what alphabet encoding is being used.
84results on a text encoded, say, using ISO-8859-16 versus UTF-8.) 84results on a text encoded, say, using ISO-8859-16 versus UTF-8.)
85 85
86There is an extra category of plain text files that are "polluted" with 86There is an extra category of plain text files that are "polluted" with
87one or more black-listed codes, either by mistake or by peculiar design 87one or more block-listed codes, either by mistake or by peculiar design
88considerations. In such cases, a scheme that tolerates a small fraction 88considerations. In such cases, a scheme that tolerates a small fraction
89of black-listed codes would provide an increased recall (i.e. more true 89of block-listed codes would provide an increased recall (i.e. more true
90positives). This, however, incurs a reduced precision overall, since 90positives). This, however, incurs a reduced precision overall, since
91false positives are more likely to appear in binary files that contain 91false positives are more likely to appear in binary files that contain
92large chunks of textual data. Furthermore, "polluted" plain text should 92large chunks of textual data. Furthermore, "polluted" plain text should