diff options
author | cvs2git convertor <vieuxtech@gmail.com> | 2007-10-11 21:16:29 +0000 |
---|---|---|
committer | cvs2git convertor <vieuxtech@gmail.com> | 2007-10-11 21:16:29 +0000 |
commit | 81ebe649f0134a66137d3266fcd2e1b93d9e0ba3 (patch) | |
tree | ce80392f901dea37824a4cd54b8d9e08f769cac3 | |
parent | 52ac60af8132ae7e42151d3012a9607d7cadaf95 (diff) | |
download | luasocket-2.0.2.tar.gz luasocket-2.0.2.tar.bz2 luasocket-2.0.2.zip |
This commit was manufactured by cvs2svn to create tag 'luasocket-2-0-2'.v2.0.2
Sprout from master 2007-10-11 21:16:28 UTC Diego Nehab <diego@tecgraf.puc-rio.br> 'Tested each sample.'
Cherrypick from master 2007-05-31 22:27:40 UTC Diego Nehab <diego@tecgraf.puc-rio.br> 'Before sending to Roberto.':
gem/ltn012.tex
gem/makefile
-rw-r--r-- | gem/ltn012.tex | 355 | ||||
-rw-r--r-- | gem/makefile | 9 |
2 files changed, 165 insertions, 199 deletions
diff --git a/gem/ltn012.tex b/gem/ltn012.tex index 8eccd46..0f81b86 100644 --- a/gem/ltn012.tex +++ b/gem/ltn012.tex | |||
@@ -6,10 +6,7 @@ | |||
6 | \DefineVerbatimEnvironment{mime}{Verbatim}{fontsize=\small,commandchars=\$\#\%} | 6 | \DefineVerbatimEnvironment{mime}{Verbatim}{fontsize=\small,commandchars=\$\#\%} |
7 | \newcommand{\stick}[1]{\vbox{\setlength{\parskip}{0pt}#1}} | 7 | \newcommand{\stick}[1]{\vbox{\setlength{\parskip}{0pt}#1}} |
8 | \newcommand{\bl}{\ensuremath{\mathtt{\backslash}}} | 8 | \newcommand{\bl}{\ensuremath{\mathtt{\backslash}}} |
9 | \newcommand{\CR}{\texttt{CR}} | 9 | |
10 | \newcommand{\LF}{\texttt{LF}} | ||
11 | \newcommand{\CRLF}{\texttt{CR~LF}} | ||
12 | \newcommand{\nil}{\texttt{nil}} | ||
13 | 10 | ||
14 | \title{Filters, sources, sinks, and pumps\\ | 11 | \title{Filters, sources, sinks, and pumps\\ |
15 | {\large or Functional programming for the rest of us}} | 12 | {\large or Functional programming for the rest of us}} |
@@ -20,31 +17,30 @@ | |||
20 | \maketitle | 17 | \maketitle |
21 | 18 | ||
22 | \begin{abstract} | 19 | \begin{abstract} |
23 | Certain data processing operations can be implemented in the | 20 | Certain data processing operations can be implemented in the |
24 | form of filters. A filter is a function that can process | 21 | form of filters. A filter is a function that can process data |
25 | data received in consecutive invocations, returning partial | 22 | received in consecutive function calls, returning partial |
26 | results each time it is called. Examples of operations that | 23 | results after each invocation. Examples of operations that can be |
27 | can be implemented as filters include the end-of-line | 24 | implemented as filters include the end-of-line normalization |
28 | normalization for text, Base64 and Quoted-Printable transfer | 25 | for text, Base64 and Quoted-Printable transfer content |
29 | content encodings, the breaking of text into lines, SMTP | 26 | encodings, the breaking of text into lines, SMTP dot-stuffing, |
30 | dot-stuffing, and there are many others. Filters become | 27 | and there are many others. Filters become even |
31 | even more powerful when we allow them to be chained together | 28 | more powerful when we allow them to be chained together to |
32 | to create composite filters. In this context, filters can be | 29 | create composite filters. In this context, filters can be seen |
33 | seen as the internal links in a chain of data transformations. | 30 | as the middle links in a chain of data transformations. Sources an sinks |
34 | Sources and sinks are the corresponding end points in these | 31 | are the corresponding end points of these chains. A source |
35 | chains. A source is a function that produces data, chunk by | 32 | is a function that produces data, chunk by chunk, and a sink |
36 | chunk, and a sink is a function that takes data, chunk by | 33 | is a function that takes data, chunk by chunk. In this |
37 | chunk. Finally, pumps are procedures that actively drive | 34 | article, we describe the design of an elegant interface for filters, |
38 | data from a source to a sink, and indirectly through all | 35 | sources, sinks, and chaining, and illustrate each step |
39 | intervening filters. In this article, we describe the design of an | 36 | with concrete examples. |
40 | elegant interface for filters, sources, sinks, chains, and | ||
41 | pumps, and we illustrate each step with concrete examples. | ||
42 | \end{abstract} | 37 | \end{abstract} |
43 | 38 | ||
39 | |||
44 | \section{Introduction} | 40 | \section{Introduction} |
45 | 41 | ||
46 | Within the realm of networking applications, we are often | 42 | Within the realm of networking applications, we are often |
47 | required to apply transformations to streams of data. Examples | 43 | required apply transformations to streams of data. Examples |
48 | include the end-of-line normalization for text, Base64 and | 44 | include the end-of-line normalization for text, Base64 and |
49 | Quoted-Printable transfer content encodings, breaking text | 45 | Quoted-Printable transfer content encodings, breaking text |
50 | into lines with a maximum number of columns, SMTP | 46 | into lines with a maximum number of columns, SMTP |
@@ -54,10 +50,11 @@ transfer coding, and the list goes on. | |||
54 | Many complex tasks require a combination of two or more such | 50 | Many complex tasks require a combination of two or more such |
55 | transformations, and therefore a general mechanism for | 51 | transformations, and therefore a general mechanism for |
56 | promoting reuse is desirable. In the process of designing | 52 | promoting reuse is desirable. In the process of designing |
57 | \texttt{LuaSocket~2.0}, we repeatedly faced this problem. | 53 | \texttt{LuaSocket~2.0}, David Burgess and I were forced to deal with |
58 | The solution we reached proved to be very general and | 54 | this problem. The solution we reached proved to be very |
59 | convenient. It is based on the concepts of filters, sources, | 55 | general and convenient. It is based on the concepts of |
60 | sinks, and pumps, which we introduce below. | 56 | filters, sources, sinks, and pumps, which we introduce |
57 | below. | ||
61 | 58 | ||
62 | \emph{Filters} are functions that can be repeatedly invoked | 59 | \emph{Filters} are functions that can be repeatedly invoked |
63 | with chunks of input, successively returning processed | 60 | with chunks of input, successively returning processed |
@@ -65,33 +62,34 @@ chunks of output. More importantly, the result of | |||
65 | concatenating all the output chunks must be the same as the | 62 | concatenating all the output chunks must be the same as the |
66 | result of applying the filter to the concatenation of all | 63 | result of applying the filter to the concatenation of all |
67 | input chunks. In fancier language, filters \emph{commute} | 64 | input chunks. In fancier language, filters \emph{commute} |
68 | with the concatenation operator. More importantly, filters | 65 | with the concatenation operator. As a result, chunk |
69 | must handle input data correctly no matter how the stream | 66 | boundaries are irrelevant: filters correctly handle input |
70 | has been split into chunks. | 67 | data no matter how it is split. |
71 | 68 | ||
72 | A \emph{chain} is a function that transparently combines the | 69 | A \emph{chain} transparently combines the effect of one or |
73 | effect of one or more filters. The interface of a chain is | 70 | more filters. The interface of a chain is |
74 | indistinguishable from the interface of its component | 71 | indistinguishable from the interface of its components. |
75 | filters. This allows a chained filter to be used wherever | 72 | This allows a chained filter to be used wherever an atomic |
76 | an atomic filter is accepted. In particular, chains can be | 73 | filter is expected. In particular, chains can be |
77 | themselves chained to create arbitrarily complex operations. | 74 | themselves chained to create arbitrarily complex operations. |
78 | 75 | ||
79 | Filters can be seen as internal nodes in a network through | 76 | Filters can be seen as internal nodes in a network through |
80 | which data will flow, potentially being transformed many | 77 | which data will flow, potentially being transformed many |
81 | times along the way. Chains connect these nodes together. | 78 | times along its way. Chains connect these nodes together. |
82 | The initial and final nodes of the network are | 79 | To complete the picture, we need \emph{sources} and |
83 | \emph{sources} and \emph{sinks}, respectively. Less | 80 | \emph{sinks}. These are the initial and final nodes of the |
84 | abstractly, a source is a function that produces new data | 81 | network, respectively. Less abstractly, a source is a |
85 | every time it is invoked. Conversely, sinks are functions | 82 | function that produces new data every time it is called. |
86 | that give a final destination to the data they receive. | 83 | Conversely, sinks are functions that give a final |
87 | Naturally, sources and sinks can also be chained with | 84 | destination to the data they receive. Naturally, sources |
88 | filters to produce filtered sources and sinks. | 85 | and sinks can also be chained with filters to produce |
86 | filtered sources and sinks. | ||
89 | 87 | ||
90 | Finally, filters, chains, sources, and sinks are all passive | 88 | Finally, filters, chains, sources, and sinks are all passive |
91 | entities: they must be repeatedly invoked in order for | 89 | entities: they must be repeatedly invoked in order for |
92 | anything to happen. \emph{Pumps} provide the driving force | 90 | anything to happen. \emph{Pumps} provide the driving force |
93 | that pushes data through the network, from a source to a | 91 | that pushes data through the network, from a source to a |
94 | sink, and indirectly through all intervening filters. | 92 | sink. |
95 | 93 | ||
96 | In the following sections, we start with a simplified | 94 | In the following sections, we start with a simplified |
97 | interface, which we later refine. The evolution we present | 95 | interface, which we later refine. The evolution we present |
@@ -101,28 +99,27 @@ concepts within our application domain. | |||
101 | 99 | ||
102 | \subsection{A simple example} | 100 | \subsection{A simple example} |
103 | 101 | ||
104 | The end-of-line normalization of text is a good | 102 | Let us use the end-of-line normalization of text as an |
105 | example to motivate our initial filter interface. | 103 | example to motivate our initial filter interface. |
106 | Assume we are given text in an unknown end-of-line | 104 | Assume we are given text in an unknown end-of-line |
107 | convention (including possibly mixed conventions) out of the | 105 | convention (including possibly mixed conventions) out of the |
108 | commonly found Unix (\LF), Mac OS (\CR), and | 106 | commonly found Unix (LF), Mac OS (CR), and DOS (CRLF) |
109 | DOS (\CRLF) conventions. We would like to be able to | 107 | conventions. We would like to be able to write code like the |
110 | use the folowing code to normalize the end-of-line markers: | 108 | following: |
111 | \begin{quote} | 109 | \begin{quote} |
112 | \begin{lua} | 110 | \begin{lua} |
113 | @stick# | 111 | @stick# |
114 | local CRLF = "\013\010" | 112 | local in = source.chain(source.file(io.stdin), normalize("\r\n")) |
115 | local input = source.chain(source.file(io.stdin), normalize(CRLF)) | 113 | local out = sink.file(io.stdout) |
116 | local output = sink.file(io.stdout) | 114 | pump.all(in, out) |
117 | pump.all(input, output) | ||
118 | % | 115 | % |
119 | \end{lua} | 116 | \end{lua} |
120 | \end{quote} | 117 | \end{quote} |
121 | 118 | ||
122 | This program should read data from the standard input stream | 119 | This program should read data from the standard input stream |
123 | and normalize the end-of-line markers to the canonic | 120 | and normalize the end-of-line markers to the canonic CRLF |
124 | \CRLF\ marker, as defined by the MIME standard. | 121 | marker, as defined by the MIME standard. Finally, the |
125 | Finally, the normalized text should be sent to the standard output | 122 | normalized text should be sent to the standard output |
126 | stream. We use a \emph{file source} that produces data from | 123 | stream. We use a \emph{file source} that produces data from |
127 | standard input, and chain it with a filter that normalizes | 124 | standard input, and chain it with a filter that normalizes |
128 | the data. The pump then repeatedly obtains data from the | 125 | the data. The pump then repeatedly obtains data from the |
@@ -130,28 +127,27 @@ source, and passes it to the \emph{file sink}, which sends | |||
130 | it to the standard output. | 127 | it to the standard output. |
131 | 128 | ||
132 | In the code above, the \texttt{normalize} \emph{factory} is a | 129 | In the code above, the \texttt{normalize} \emph{factory} is a |
133 | function that creates our normalization filter, which | 130 | function that creates our normalization filter. This filter |
134 | replaces any end-of-line marker with the canonic marker. | 131 | will replace any end-of-line marker with the canonic |
135 | The initial filter interface is | 132 | `\verb|\r\n|' marker. The initial filter interface is |
136 | trivial: a filter function receives a chunk of input data, | 133 | trivial: a filter function receives a chunk of input data, |
137 | and returns a chunk of processed data. When there are no | 134 | and returns a chunk of processed data. When there are no |
138 | more input data left, the caller notifies the filter by invoking | 135 | more input data left, the caller notifies the filter by invoking |
139 | it with a \nil\ chunk. The filter responds by returning | 136 | it with a \texttt{nil} chunk. The filter responds by returning |
140 | the final chunk of processed data (which could of course be | 137 | the final chunk of processed data. |
141 | the empty string). | ||
142 | 138 | ||
143 | Although the interface is extremely simple, the | 139 | Although the interface is extremely simple, the |
144 | implementation is not so obvious. A normalization filter | 140 | implementation is not so obvious. A normalization filter |
145 | respecting this interface needs to keep some kind of context | 141 | respecting this interface needs to keep some kind of context |
146 | between calls. This is because a chunk boundary may lie between | 142 | between calls. This is because a chunk boundary may lie between |
147 | the \CR\ and \LF\ characters marking the end of a single line. This | 143 | the CR and LF characters marking the end of a line. This |
148 | need for contextual storage motivates the use of | 144 | need for contextual storage motivates the use of |
149 | factories: each time the factory is invoked, it returns a | 145 | factories: each time the factory is invoked, it returns a |
150 | filter with its own context so that we can have several | 146 | filter with its own context so that we can have several |
151 | independent filters being used at the same time. For | 147 | independent filters being used at the same time. For |
152 | efficiency reasons, we must avoid the obvious solution of | 148 | efficiency reasons, we must avoid the obvious solution of |
153 | concatenating all the input into the context before | 149 | concatenating all the input into the context before |
154 | producing any output chunks. | 150 | producing any output. |
155 | 151 | ||
156 | To that end, we break the implementation into two parts: | 152 | To that end, we break the implementation into two parts: |
157 | a low-level filter, and a factory of high-level filters. The | 153 | a low-level filter, and a factory of high-level filters. The |
@@ -171,10 +167,10 @@ end-of-line normalization filters: | |||
171 | \begin{quote} | 167 | \begin{quote} |
172 | \begin{lua} | 168 | \begin{lua} |
173 | @stick# | 169 | @stick# |
174 | function filter.cycle(lowlevel, context, extra) | 170 | function filter.cycle(low, ctx, extra) |
175 | return function(chunk) | 171 | return function(chunk) |
176 | local ret | 172 | local ret |
177 | ret, context = lowlevel(context, chunk, extra) | 173 | ret, ctx = low(ctx, chunk, extra) |
178 | return ret | 174 | return ret |
179 | end | 175 | end |
180 | end | 176 | end |
@@ -182,30 +178,27 @@ end | |||
182 | 178 | ||
183 | @stick# | 179 | @stick# |
184 | function normalize(marker) | 180 | function normalize(marker) |
185 | return filter.cycle(eol, 0, marker) | 181 | return cycle(eol, 0, marker) |
186 | end | 182 | end |
187 | % | 183 | % |
188 | \end{lua} | 184 | \end{lua} |
189 | \end{quote} | 185 | \end{quote} |
190 | 186 | ||
191 | The \texttt{normalize} factory simply calls a more generic | 187 | The \texttt{normalize} factory simply calls a more generic |
192 | factory, the \texttt{cycle}~factory, passing the low-level | 188 | factory, the \texttt{cycle} factory. This factory receives a |
193 | filter~\texttt{eol}. The \texttt{cycle}~factory receives a | ||
194 | low-level filter, an initial context, and an extra | 189 | low-level filter, an initial context, and an extra |
195 | parameter, and returns a new high-level filter. Each time | 190 | parameter, and returns a new high-level filter. Each time |
196 | the high-level filer is passed a new chunk, it invokes the | 191 | the high-level filer is passed a new chunk, it invokes the |
197 | low-level filter with the previous context, the new chunk, | 192 | low-level filter with the previous context, the new chunk, |
198 | and the extra argument. It is the low-level filter that | 193 | and the extra argument. It is the low-level filter that |
199 | does all the work, producing the chunk of processed data and | 194 | does all the work, producing the chunk of processed data and |
200 | a new context. The high-level filter then replaces its | 195 | a new context. The high-level filter then updates its |
201 | internal context, and returns the processed chunk of data to | 196 | internal context, and returns the processed chunk of data to |
202 | the user. Notice that we take advantage of Lua's lexical | 197 | the user. Notice that we take advantage of Lua's lexical |
203 | scoping to store the context in a closure between function | 198 | scoping to store the context in a closure between function |
204 | calls. | 199 | calls. |
205 | 200 | ||
206 | \subsection{The C part of the filter} | 201 | Concerning the low-level filter code, we must first accept |
207 | |||
208 | As for the low-level filter, we must first accept | ||
209 | that there is no perfect solution to the end-of-line marker | 202 | that there is no perfect solution to the end-of-line marker |
210 | normalization problem. The difficulty comes from an | 203 | normalization problem. The difficulty comes from an |
211 | inherent ambiguity in the definition of empty lines within | 204 | inherent ambiguity in the definition of empty lines within |
@@ -215,39 +208,39 @@ mixed input. It also does a reasonable job with empty lines | |||
215 | and serves as a good example of how to implement a low-level | 208 | and serves as a good example of how to implement a low-level |
216 | filter. | 209 | filter. |
217 | 210 | ||
218 | The idea is to consider both \CR\ and~\LF\ as end-of-line | 211 | The idea is to consider both CR and~LF as end-of-line |
219 | \emph{candidates}. We issue a single break if any candidate | 212 | \emph{candidates}. We issue a single break if any candidate |
220 | is seen alone, or if it is followed by a different | 213 | is seen alone, or followed by a different candidate. In |
221 | candidate. In other words, \CR~\CR~and \LF~\LF\ each issue | 214 | other words, CR~CR~and LF~LF each issue two end-of-line |
222 | two end-of-line markers, whereas \CR~\LF~and \LF~\CR\ issue | 215 | markers, whereas CR~LF~and LF~CR issue only one marker each. |
223 | only one marker each. It is easy to see that this method | 216 | This method correctly handles the Unix, DOS/MIME, VMS, and Mac |
224 | correctly handles the most common end-of-line conventions. | 217 | OS conventions. |
225 | 218 | ||
226 | With this in mind, we divide the low-level filter into two | 219 | \subsection{The C part of the filter} |
227 | simple functions. The inner function~\texttt{pushchar} performs the | 220 | |
228 | normalization itself. It takes each input character in turn, | 221 | Our low-level filter is divided into two simple functions. |
229 | deciding what to output and how to modify the context. The | 222 | The inner function performs the normalization itself. It takes |
230 | context tells if the last processed character was an | 223 | each input character in turn, deciding what to output and |
231 | end-of-line candidate, and if so, which candidate it was. | 224 | how to modify the context. The context tells if the last |
232 | For efficiency, we use Lua's auxiliary library's buffer | 225 | processed character was an end-of-line candidate, and if so, |
233 | interface: | 226 | which candidate it was. For efficiency, it uses |
227 | Lua's auxiliary library's buffer interface: | ||
234 | \begin{quote} | 228 | \begin{quote} |
235 | \begin{C} | 229 | \begin{C} |
236 | @stick# | 230 | @stick# |
237 | @#define candidate(c) (c == CR || c == LF) | 231 | @#define candidate(c) (c == CR || c == LF) |
238 | static int pushchar(int c, int last, const char *marker, | 232 | static int process(int c, int last, const char *marker, |
239 | luaL_Buffer *buffer) { | 233 | luaL_Buffer *buffer) { |
240 | if (candidate(c)) { | 234 | if (candidate(c)) { |
241 | if (candidate(last)) { | 235 | if (candidate(last)) { |
242 | if (c == last) | 236 | if (c == last) luaL_addstring(buffer, marker); |
243 | luaL_addstring(buffer, marker); | ||
244 | return 0; | 237 | return 0; |
245 | } else { | 238 | } else { |
246 | luaL_addstring(buffer, marker); | 239 | luaL_addstring(buffer, marker); |
247 | return c; | 240 | return c; |
248 | } | 241 | } |
249 | } else { | 242 | } else { |
250 | luaL_pushchar(buffer, c); | 243 | luaL_putchar(buffer, c); |
251 | return 0; | 244 | return 0; |
252 | } | 245 | } |
253 | } | 246 | } |
@@ -255,20 +248,15 @@ static int pushchar(int c, int last, const char *marker, | |||
255 | \end{C} | 248 | \end{C} |
256 | \end{quote} | 249 | \end{quote} |
257 | 250 | ||
258 | The outer function~\texttt{eol} simply interfaces with Lua. | 251 | The outer function simply interfaces with Lua. It receives the |
259 | It receives the context and input chunk (as well as an | 252 | context and input chunk (as well as an optional |
260 | optional custom end-of-line marker), and returns the | 253 | custom end-of-line marker), and returns the transformed |
261 | transformed output chunk and the new context. | 254 | output chunk and the new context: |
262 | Notice that if the input chunk is \nil, the operation | ||
263 | is considered to be finished. In that case, the loop will | ||
264 | not execute a single time and the context is reset to the | ||
265 | initial state. This allows the filter to be reused many | ||
266 | times: | ||
267 | \begin{quote} | 255 | \begin{quote} |
268 | \begin{C} | 256 | \begin{C} |
269 | @stick# | 257 | @stick# |
270 | static int eol(lua_State *L) { | 258 | static int eol(lua_State *L) { |
271 | int context = luaL_checkint(L, 1); | 259 | int ctx = luaL_checkint(L, 1); |
272 | size_t isize = 0; | 260 | size_t isize = 0; |
273 | const char *input = luaL_optlstring(L, 2, NULL, &isize); | 261 | const char *input = luaL_optlstring(L, 2, NULL, &isize); |
274 | const char *last = input + isize; | 262 | const char *last = input + isize; |
@@ -281,18 +269,24 @@ static int eol(lua_State *L) { | |||
281 | return 2; | 269 | return 2; |
282 | } | 270 | } |
283 | while (input < last) | 271 | while (input < last) |
284 | context = pushchar(*input++, context, marker, &buffer); | 272 | ctx = process(*input++, ctx, marker, &buffer); |
285 | luaL_pushresult(&buffer); | 273 | luaL_pushresult(&buffer); |
286 | lua_pushnumber(L, context); | 274 | lua_pushnumber(L, ctx); |
287 | return 2; | 275 | return 2; |
288 | } | 276 | } |
289 | % | 277 | % |
290 | \end{C} | 278 | \end{C} |
291 | \end{quote} | 279 | \end{quote} |
292 | 280 | ||
281 | Notice that if the input chunk is \texttt{nil}, the operation | ||
282 | is considered to be finished. In that case, the loop will | ||
283 | not execute a single time and the context is reset to the | ||
284 | initial state. This allows the filter to be reused many | ||
285 | times. | ||
286 | |||
293 | When designing your own filters, the challenging part is to | 287 | When designing your own filters, the challenging part is to |
294 | decide what will be in the context. For line breaking, for | 288 | decide what will be in the context. For line breaking, for |
295 | instance, it could be the number of bytes that still fit in the | 289 | instance, it could be the number of bytes left in the |
296 | current line. For Base64 encoding, it could be a string | 290 | current line. For Base64 encoding, it could be a string |
297 | with the bytes that remain after the division of the input | 291 | with the bytes that remain after the division of the input |
298 | into 3-byte atoms. The MIME module in the \texttt{LuaSocket} | 292 | into 3-byte atoms. The MIME module in the \texttt{LuaSocket} |
@@ -300,22 +294,19 @@ distribution has many other examples. | |||
300 | 294 | ||
301 | \section{Filter chains} | 295 | \section{Filter chains} |
302 | 296 | ||
303 | Chains greatly increase the power of filters. For example, | 297 | Chains add a lot to the power of filters. For example, |
304 | according to the standard for Quoted-Printable encoding, | 298 | according to the standard for Quoted-Printable encoding, |
305 | text should be normalized to a canonic end-of-line marker | 299 | text must be normalized to a canonic end-of-line marker |
306 | prior to encoding. After encoding, the resulting text must | 300 | prior to encoding. To help specifying complex |
307 | be broken into lines of no more than 76 characters, with the | 301 | transformations like this, we define a chain factory that |
308 | use of soft line breaks (a line terminated by the \texttt{=} | 302 | creates a composite filter from one or more filters. A |
309 | sign). To help specifying complex transformations like | 303 | chained filter passes data through all its components, and |
310 | this, we define a chain factory that creates a composite | 304 | can be used wherever a primitive filter is accepted. |
311 | filter from one or more filters. A chained filter passes | ||
312 | data through all its components, and can be used wherever a | ||
313 | primitive filter is accepted. | ||
314 | 305 | ||
315 | The chaining factory is very simple. The auxiliary | 306 | The chaining factory is very simple. The auxiliary |
316 | function~\texttt{chainpair} chains two filters together, | 307 | function~\texttt{chainpair} chains two filters together, |
317 | taking special care if the chunk is the last. This is | 308 | taking special care if the chunk is the last. This is |
318 | because the final \nil\ chunk notification has to be | 309 | because the final \texttt{nil} chunk notification has to be |
319 | pushed through both filters in turn: | 310 | pushed through both filters in turn: |
320 | \begin{quote} | 311 | \begin{quote} |
321 | \begin{lua} | 312 | \begin{lua} |
@@ -331,9 +322,9 @@ end | |||
331 | 322 | ||
332 | @stick# | 323 | @stick# |
333 | function filter.chain(...) | 324 | function filter.chain(...) |
334 | local f = select(1, ...) | 325 | local f = arg[1] |
335 | for i = 2, select('@#', ...) do | 326 | for i = 2, @#arg do |
336 | f = chainpair(f, select(i, ...)) | 327 | f = chainpair(f, arg[i]) |
337 | end | 328 | end |
338 | return f | 329 | return f |
339 | end | 330 | end |
@@ -346,11 +337,11 @@ define the Quoted-Printable conversion as such: | |||
346 | \begin{quote} | 337 | \begin{quote} |
347 | \begin{lua} | 338 | \begin{lua} |
348 | @stick# | 339 | @stick# |
349 | local qp = filter.chain(normalize(CRLF), encode("quoted-printable"), | 340 | local qp = filter.chain(normalize("\r\n"), |
350 | wrap("quoted-printable")) | 341 | encode("quoted-printable")) |
351 | local input = source.chain(source.file(io.stdin), qp) | 342 | local in = source.chain(source.file(io.stdin), qp) |
352 | local output = sink.file(io.stdout) | 343 | local out = sink.file(io.stdout) |
353 | pump.all(input, output) | 344 | pump.all(in, out) |
354 | % | 345 | % |
355 | \end{lua} | 346 | \end{lua} |
356 | \end{quote} | 347 | \end{quote} |
@@ -369,14 +360,14 @@ gives a final destination to the data. | |||
369 | \subsection{Sources} | 360 | \subsection{Sources} |
370 | 361 | ||
371 | A source returns the next chunk of data each time it is | 362 | A source returns the next chunk of data each time it is |
372 | invoked. When there is no more data, it simply returns~\nil. | 363 | invoked. When there is no more data, it simply returns |
373 | In the event of an error, the source can inform the | 364 | \texttt{nil}. In the event of an error, the source can inform the |
374 | caller by returning \nil\ followed by the error message. | 365 | caller by returning \texttt{nil} followed by an error message. |
375 | 366 | ||
376 | Below are two simple source factories. The \texttt{empty} source | 367 | Below are two simple source factories. The \texttt{empty} source |
377 | returns no data, possibly returning an associated error | 368 | returns no data, possibly returning an associated error |
378 | message. The \texttt{file} source yields the contents of a file | 369 | message. The \texttt{file} source works harder, and |
379 | in a chunk by chunk fashion: | 370 | yields the contents of a file in a chunk by chunk fashion: |
380 | \begin{quote} | 371 | \begin{quote} |
381 | \begin{lua} | 372 | \begin{lua} |
382 | @stick# | 373 | @stick# |
@@ -407,7 +398,7 @@ A filtered source passes its data through the | |||
407 | associated filter before returning it to the caller. | 398 | associated filter before returning it to the caller. |
408 | Filtered sources are useful when working with | 399 | Filtered sources are useful when working with |
409 | functions that get their input data from a source (such as | 400 | functions that get their input data from a source (such as |
410 | the pumps in our examples). By chaining a source with one or | 401 | the pump in our first example). By chaining a source with one or |
411 | more filters, the function can be transparently provided | 402 | more filters, the function can be transparently provided |
412 | with filtered data, with no need to change its interface. | 403 | with filtered data, with no need to change its interface. |
413 | Here is a factory that does the job: | 404 | Here is a factory that does the job: |
@@ -415,18 +406,14 @@ Here is a factory that does the job: | |||
415 | \begin{lua} | 406 | \begin{lua} |
416 | @stick# | 407 | @stick# |
417 | function source.chain(src, f) | 408 | function source.chain(src, f) |
418 | return function() | 409 | return source.simplify(function() |
419 | if not src then | 410 | if not src then return nil end |
420 | return nil | ||
421 | end | ||
422 | local chunk, err = src() | 411 | local chunk, err = src() |
423 | if not chunk then | 412 | if not chunk then |
424 | src = nil | 413 | src = nil |
425 | return f(nil) | 414 | return f(nil) |
426 | else | 415 | else return f(chunk) end |
427 | return f(chunk) | 416 | end) |
428 | end | ||
429 | end | ||
430 | end | 417 | end |
431 | % | 418 | % |
432 | \end{lua} | 419 | \end{lua} |
@@ -434,20 +421,20 @@ end | |||
434 | 421 | ||
435 | \subsection{Sinks} | 422 | \subsection{Sinks} |
436 | 423 | ||
437 | Just as we defined an interface for source of data, | 424 | Just as we defined an interface a data source, |
438 | we can also define an interface for a data destination. | 425 | we can also define an interface for a data destination. |
439 | We call any function respecting this | 426 | We call any function respecting this |
440 | interface a \emph{sink}. In our first example, we used a | 427 | interface a \emph{sink}. In our first example, we used a |
441 | file sink connected to the standard output. | 428 | file sink connected to the standard output. |
442 | 429 | ||
443 | Sinks receive consecutive chunks of data, until the end of | 430 | Sinks receive consecutive chunks of data, until the end of |
444 | data is signaled by a \nil\ input chunk. A sink can be | 431 | data is signaled by a \texttt{nil} chunk. A sink can be |
445 | notified of an error with an optional extra argument that | 432 | notified of an error with an optional extra argument that |
446 | contains the error message, following a \nil\ chunk. | 433 | contains the error message, following a \texttt{nil} chunk. |
447 | If a sink detects an error itself, and | 434 | If a sink detects an error itself, and |
448 | wishes not to be called again, it can return \nil, | 435 | wishes not to be called again, it can return \texttt{nil}, |
449 | followed by an error message. A return value that | 436 | followed by an error message. A return value that |
450 | is not \nil\ means the sink will accept more data. | 437 | is not \texttt{nil} means the source will accept more data. |
451 | 438 | ||
452 | Below are two useful sink factories. | 439 | Below are two useful sink factories. |
453 | The table factory creates a sink that stores | 440 | The table factory creates a sink that stores |
@@ -482,7 +469,7 @@ end | |||
482 | 469 | ||
483 | Naturally, filtered sinks are just as useful as filtered | 470 | Naturally, filtered sinks are just as useful as filtered |
484 | sources. A filtered sink passes each chunk it receives | 471 | sources. A filtered sink passes each chunk it receives |
485 | through the associated filter before handing it down to the | 472 | through the associated filter before handing it to the |
486 | original sink. In the following example, we use a source | 473 | original sink. In the following example, we use a source |
487 | that reads from the standard input. The input chunks are | 474 | that reads from the standard input. The input chunks are |
488 | sent to a table sink, which has been coupled with a | 475 | sent to a table sink, which has been coupled with a |
@@ -492,10 +479,10 @@ standard out: | |||
492 | \begin{quote} | 479 | \begin{quote} |
493 | \begin{lua} | 480 | \begin{lua} |
494 | @stick# | 481 | @stick# |
495 | local input = source.file(io.stdin) | 482 | local in = source.file(io.stdin) |
496 | local output, t = sink.table() | 483 | local out, t = sink.table() |
497 | output = sink.chain(normalize(CRLF), output) | 484 | out = sink.chain(normalize("\r\n"), out) |
498 | pump.all(input, output) | 485 | pump.all(in, out) |
499 | io.write(table.concat(t)) | 486 | io.write(table.concat(t)) |
500 | % | 487 | % |
501 | \end{lua} | 488 | \end{lua} |
@@ -503,11 +490,11 @@ io.write(table.concat(t)) | |||
503 | 490 | ||
504 | \subsection{Pumps} | 491 | \subsection{Pumps} |
505 | 492 | ||
506 | Although not on purpose, our interface for sources is | 493 | Adrian Sietsma noticed that, although not on purpose, our |
507 | compatible with Lua iterators. That is, a source can be | 494 | interface for sources is compatible with Lua iterators. |
508 | neatly used in conjunction with \texttt{for} loops. Using | 495 | That is, a source can be neatly used in conjunction |
509 | our file source as an iterator, we can write the following | 496 | with \texttt{for} loops. Using our file |
510 | code: | 497 | source as an iterator, we can write the following code: |
511 | \begin{quote} | 498 | \begin{quote} |
512 | \begin{lua} | 499 | \begin{lua} |
513 | @stick# | 500 | @stick# |
@@ -552,22 +539,20 @@ end | |||
552 | The \texttt{pump.step} function moves one chunk of data from | 539 | The \texttt{pump.step} function moves one chunk of data from |
553 | the source to the sink. The \texttt{pump.all} function takes | 540 | the source to the sink. The \texttt{pump.all} function takes |
554 | an optional \texttt{step} function and uses it to pump all the | 541 | an optional \texttt{step} function and uses it to pump all the |
555 | data from the source to the sink. | 542 | data from the source to the sink. We can now use everything |
556 | Here is an example that uses the Base64 and the | 543 | we have to write a program that reads a binary file from |
557 | line wrapping filters from the \texttt{LuaSocket} | ||
558 | distribution. The program reads a binary file from | ||
559 | disk and stores it in another file, after encoding it to the | 544 | disk and stores it in another file, after encoding it to the |
560 | Base64 transfer content encoding: | 545 | Base64 transfer content encoding: |
561 | \begin{quote} | 546 | \begin{quote} |
562 | \begin{lua} | 547 | \begin{lua} |
563 | @stick# | 548 | @stick# |
564 | local input = source.chain( | 549 | local in = source.chain( |
565 | source.file(io.open("input.bin", "rb")), | 550 | source.file(io.open("input.bin", "rb")), |
566 | encode("base64")) | 551 | encode("base64")) |
567 | local output = sink.chain( | 552 | local out = sink.chain( |
568 | wrap(76), | 553 | wrap(76), |
569 | sink.file(io.open("output.b64", "w"))) | 554 | sink.file(io.open("output.b64", "w"))) |
570 | pump.all(input, output) | 555 | pump.all(in, out) |
571 | % | 556 | % |
572 | \end{lua} | 557 | \end{lua} |
573 | \end{quote} | 558 | \end{quote} |
@@ -576,17 +561,19 @@ The way we split the filters here is not intuitive, on | |||
576 | purpose. Alternatively, we could have chained the Base64 | 561 | purpose. Alternatively, we could have chained the Base64 |
577 | encode filter and the line-wrap filter together, and then | 562 | encode filter and the line-wrap filter together, and then |
578 | chain the resulting filter with either the file source or | 563 | chain the resulting filter with either the file source or |
579 | the file sink. It doesn't really matter. | 564 | the file sink. It doesn't really matter. The Base64 and the |
565 | line wrapping filters are part of the \texttt{LuaSocket} | ||
566 | distribution. | ||
580 | 567 | ||
581 | \section{Exploding filters} | 568 | \section{Exploding filters} |
582 | 569 | ||
583 | Our current filter interface has one serious shortcoming. | 570 | Our current filter interface has one flagrant shortcoming. |
584 | Consider for example a \texttt{gzip} decompression filter. | 571 | When David Burgess was writing his \texttt{gzip} filter, he |
585 | During decompression, a small input chunk can be exploded | 572 | noticed that a decompression filter can explode a small |
586 | into a huge amount of data. To address this problem, we | 573 | input chunk into a huge amount of data. To address this |
587 | decided to change the filter interface and allow exploding | 574 | problem, we decided to change the filter interface and allow |
588 | filters to return large quantities of output data in a chunk | 575 | exploding filters to return large quantities of output data |
589 | by chunk manner. | 576 | in a chunk by chunk manner. |
590 | 577 | ||
591 | More specifically, after passing each chunk of input to | 578 | More specifically, after passing each chunk of input to |
592 | a filter, and collecting the first chunk of output, the | 579 | a filter, and collecting the first chunk of output, the |
@@ -595,11 +582,11 @@ filtered data is left. Within these secondary calls, the | |||
595 | caller passes an empty string to the filter. The filter | 582 | caller passes an empty string to the filter. The filter |
596 | responds with an empty string when it is ready for the next | 583 | responds with an empty string when it is ready for the next |
597 | input chunk. In the end, after the user passes a | 584 | input chunk. In the end, after the user passes a |
598 | \nil\ chunk notifying the filter that there is no | 585 | \texttt{nil} chunk notifying the filter that there is no |
599 | more input data, the filter might still have to produce too | 586 | more input data, the filter might still have to produce too |
600 | much output data to return in a single chunk. The user has | 587 | much output data to return in a single chunk. The user has |
601 | to loop again, now passing \nil\ to the filter each time, | 588 | to loop again, now passing \texttt{nil} to the filter each time, |
602 | until the filter itself returns \nil\ to notify the | 589 | until the filter itself returns \texttt{nil} to notify the |
603 | user it is finally done. | 590 | user it is finally done. |
604 | 591 | ||
605 | Fortunately, it is very easy to modify a filter to respect | 592 | Fortunately, it is very easy to modify a filter to respect |
@@ -612,13 +599,13 @@ Interestingly, the modifications do not have a measurable | |||
612 | negative impact in the performance of filters that do | 599 | negative impact in the performance of filters that do |
613 | not need the added flexibility. On the other hand, for a | 600 | not need the added flexibility. On the other hand, for a |
614 | small price in complexity, the changes make exploding | 601 | small price in complexity, the changes make exploding |
615 | filters practical. | 602 | filters practical. |
616 | 603 | ||
617 | \section{A complex example} | 604 | \section{A complex example} |
618 | 605 | ||
619 | The LTN12 module in the \texttt{LuaSocket} distribution | 606 | The LTN12 module in the \texttt{LuaSocket} distribution |
620 | implements all the ideas we have described. The MIME | 607 | implements the ideas we have described. The MIME |
621 | and SMTP modules are tightly integrated with LTN12, | 608 | and SMTP modules are especially integrated with LTN12, |
622 | and can be used to showcase the expressive power of filters, | 609 | and can be used to showcase the expressive power of filters, |
623 | sources, sinks, and pumps. Below is an example | 610 | sources, sinks, and pumps. Below is an example |
624 | of how a user would proceed to define and send a | 611 | of how a user would proceed to define and send a |
@@ -635,9 +622,9 @@ local message = smtp.message{ | |||
635 | to = "Fulano <fulano@example.com>", | 622 | to = "Fulano <fulano@example.com>", |
636 | subject = "A message with an attachment"}, | 623 | subject = "A message with an attachment"}, |
637 | body = { | 624 | body = { |
638 | preamble = "Hope you can see the attachment" .. CRLF, | 625 | preamble = "Hope you can see the attachment\r\n", |
639 | [1] = { | 626 | [1] = { |
640 | body = "Here is our logo" .. CRLF}, | 627 | body = "Here is our logo\r\n"}, |
641 | [2] = { | 628 | [2] = { |
642 | headers = { | 629 | headers = { |
643 | ["content-type"] = 'image/png; name="luasocket.png"', | 630 | ["content-type"] = 'image/png; name="luasocket.png"', |
@@ -678,18 +665,6 @@ abstraction for final data destinations. Filters define an | |||
678 | interface for data transformations. The chaining of | 665 | interface for data transformations. The chaining of |
679 | filters, sources and sinks provides an elegant way to create | 666 | filters, sources and sinks provides an elegant way to create |
680 | arbitrarily complex data transformations from simpler | 667 | arbitrarily complex data transformations from simpler |
681 | components. Pumps simply push the data through. | 668 | components. Pumps simply move the data through. |
682 | |||
683 | \section{Acknowledgements} | ||
684 | |||
685 | The concepts described in this text are the result of long | ||
686 | discussions with David Burgess. A version of this text has | ||
687 | been released on-line as the Lua Technical Note 012, hence | ||
688 | the name of the corresponding LuaSocket module, | ||
689 | \texttt{ltn12}. Wim Couwenberg contributed to the | ||
690 | implementation of the module, and Adrian Sietsma was the | ||
691 | first to notice the correspondence between sources and Lua | ||
692 | iterators. | ||
693 | |||
694 | 669 | ||
695 | \end{document} | 670 | \end{document} |
diff --git a/gem/makefile b/gem/makefile index d2f0c93..a4287c2 100644 --- a/gem/makefile +++ b/gem/makefile | |||
@@ -12,12 +12,3 @@ clean: | |||
12 | 12 | ||
13 | pdf: ltn012.pdf | 13 | pdf: ltn012.pdf |
14 | open ltn012.pdf | 14 | open ltn012.pdf |
15 | |||
16 | test: gem.so | ||
17 | |||
18 | |||
19 | gem.o: gem.c | ||
20 | gcc -c -o gem.o -Wall -ansi -W -O2 gem.c | ||
21 | |||
22 | gem.so: gem.o | ||
23 | export MACOSX_DEPLOYMENT_TARGET="10.3"; gcc -bundle -undefined dynamic_lookup -o gem.so gem.o | ||