aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--gem/ltn012.tex355
-rw-r--r--gem/makefile9
2 files changed, 165 insertions, 199 deletions
diff --git a/gem/ltn012.tex b/gem/ltn012.tex
index 8eccd46..0f81b86 100644
--- a/gem/ltn012.tex
+++ b/gem/ltn012.tex
@@ -6,10 +6,7 @@
6\DefineVerbatimEnvironment{mime}{Verbatim}{fontsize=\small,commandchars=\$\#\%} 6\DefineVerbatimEnvironment{mime}{Verbatim}{fontsize=\small,commandchars=\$\#\%}
7\newcommand{\stick}[1]{\vbox{\setlength{\parskip}{0pt}#1}} 7\newcommand{\stick}[1]{\vbox{\setlength{\parskip}{0pt}#1}}
8\newcommand{\bl}{\ensuremath{\mathtt{\backslash}}} 8\newcommand{\bl}{\ensuremath{\mathtt{\backslash}}}
9\newcommand{\CR}{\texttt{CR}} 9
10\newcommand{\LF}{\texttt{LF}}
11\newcommand{\CRLF}{\texttt{CR~LF}}
12\newcommand{\nil}{\texttt{nil}}
13 10
14\title{Filters, sources, sinks, and pumps\\ 11\title{Filters, sources, sinks, and pumps\\
15 {\large or Functional programming for the rest of us}} 12 {\large or Functional programming for the rest of us}}
@@ -20,31 +17,30 @@
20\maketitle 17\maketitle
21 18
22\begin{abstract} 19\begin{abstract}
23Certain data processing operations can be implemented in the 20Certain data processing operations can be implemented in the
24form of filters. A filter is a function that can process 21form of filters. A filter is a function that can process data
25data received in consecutive invocations, returning partial 22received in consecutive function calls, returning partial
26results each time it is called. Examples of operations that 23results after each invocation. Examples of operations that can be
27can be implemented as filters include the end-of-line 24implemented as filters include the end-of-line normalization
28normalization for text, Base64 and Quoted-Printable transfer 25for text, Base64 and Quoted-Printable transfer content
29content encodings, the breaking of text into lines, SMTP 26encodings, the breaking of text into lines, SMTP dot-stuffing,
30dot-stuffing, and there are many others. Filters become 27and there are many others. Filters become even
31even more powerful when we allow them to be chained together 28more powerful when we allow them to be chained together to
32to create composite filters. In this context, filters can be 29create composite filters. In this context, filters can be seen
33seen as the internal links in a chain of data transformations. 30as the middle links in a chain of data transformations. Sources an sinks
34Sources and sinks are the corresponding end points in these 31are the corresponding end points of these chains. A source
35chains. A source is a function that produces data, chunk by 32is a function that produces data, chunk by chunk, and a sink
36chunk, and a sink is a function that takes data, chunk by 33is a function that takes data, chunk by chunk. In this
37chunk. Finally, pumps are procedures that actively drive 34article, we describe the design of an elegant interface for filters,
38data from a source to a sink, and indirectly through all 35sources, sinks, and chaining, and illustrate each step
39intervening filters. In this article, we describe the design of an 36with concrete examples.
40elegant interface for filters, sources, sinks, chains, and
41pumps, and we illustrate each step with concrete examples.
42\end{abstract} 37\end{abstract}
43 38
39
44\section{Introduction} 40\section{Introduction}
45 41
46Within the realm of networking applications, we are often 42Within the realm of networking applications, we are often
47required to apply transformations to streams of data. Examples 43required apply transformations to streams of data. Examples
48include the end-of-line normalization for text, Base64 and 44include the end-of-line normalization for text, Base64 and
49Quoted-Printable transfer content encodings, breaking text 45Quoted-Printable transfer content encodings, breaking text
50into lines with a maximum number of columns, SMTP 46into lines with a maximum number of columns, SMTP
@@ -54,10 +50,11 @@ transfer coding, and the list goes on.
54Many complex tasks require a combination of two or more such 50Many complex tasks require a combination of two or more such
55transformations, and therefore a general mechanism for 51transformations, and therefore a general mechanism for
56promoting reuse is desirable. In the process of designing 52promoting reuse is desirable. In the process of designing
57\texttt{LuaSocket~2.0}, we repeatedly faced this problem. 53\texttt{LuaSocket~2.0}, David Burgess and I were forced to deal with
58The solution we reached proved to be very general and 54this problem. The solution we reached proved to be very
59convenient. It is based on the concepts of filters, sources, 55general and convenient. It is based on the concepts of
60sinks, and pumps, which we introduce below. 56filters, sources, sinks, and pumps, which we introduce
57below.
61 58
62\emph{Filters} are functions that can be repeatedly invoked 59\emph{Filters} are functions that can be repeatedly invoked
63with chunks of input, successively returning processed 60with chunks of input, successively returning processed
@@ -65,33 +62,34 @@ chunks of output. More importantly, the result of
65concatenating all the output chunks must be the same as the 62concatenating all the output chunks must be the same as the
66result of applying the filter to the concatenation of all 63result of applying the filter to the concatenation of all
67input chunks. In fancier language, filters \emph{commute} 64input chunks. In fancier language, filters \emph{commute}
68with the concatenation operator. More importantly, filters 65with the concatenation operator. As a result, chunk
69must handle input data correctly no matter how the stream 66boundaries are irrelevant: filters correctly handle input
70has been split into chunks. 67data no matter how it is split.
71 68
72A \emph{chain} is a function that transparently combines the 69A \emph{chain} transparently combines the effect of one or
73effect of one or more filters. The interface of a chain is 70more filters. The interface of a chain is
74indistinguishable from the interface of its component 71indistinguishable from the interface of its components.
75filters. This allows a chained filter to be used wherever 72This allows a chained filter to be used wherever an atomic
76an atomic filter is accepted. In particular, chains can be 73filter is expected. In particular, chains can be
77themselves chained to create arbitrarily complex operations. 74themselves chained to create arbitrarily complex operations.
78 75
79Filters can be seen as internal nodes in a network through 76Filters can be seen as internal nodes in a network through
80which data will flow, potentially being transformed many 77which data will flow, potentially being transformed many
81times along the way. Chains connect these nodes together. 78times along its way. Chains connect these nodes together.
82The initial and final nodes of the network are 79To complete the picture, we need \emph{sources} and
83\emph{sources} and \emph{sinks}, respectively. Less 80\emph{sinks}. These are the initial and final nodes of the
84abstractly, a source is a function that produces new data 81network, respectively. Less abstractly, a source is a
85every time it is invoked. Conversely, sinks are functions 82function that produces new data every time it is called.
86that give a final destination to the data they receive. 83Conversely, sinks are functions that give a final
87Naturally, sources and sinks can also be chained with 84destination to the data they receive. Naturally, sources
88filters to produce filtered sources and sinks. 85and sinks can also be chained with filters to produce
86filtered sources and sinks.
89 87
90Finally, filters, chains, sources, and sinks are all passive 88Finally, filters, chains, sources, and sinks are all passive
91entities: they must be repeatedly invoked in order for 89entities: they must be repeatedly invoked in order for
92anything to happen. \emph{Pumps} provide the driving force 90anything to happen. \emph{Pumps} provide the driving force
93that pushes data through the network, from a source to a 91that pushes data through the network, from a source to a
94sink, and indirectly through all intervening filters. 92sink.
95 93
96In the following sections, we start with a simplified 94In the following sections, we start with a simplified
97interface, which we later refine. The evolution we present 95interface, which we later refine. The evolution we present
@@ -101,28 +99,27 @@ concepts within our application domain.
101 99
102\subsection{A simple example} 100\subsection{A simple example}
103 101
104The end-of-line normalization of text is a good 102Let us use the end-of-line normalization of text as an
105example to motivate our initial filter interface. 103example to motivate our initial filter interface.
106Assume we are given text in an unknown end-of-line 104Assume we are given text in an unknown end-of-line
107convention (including possibly mixed conventions) out of the 105convention (including possibly mixed conventions) out of the
108commonly found Unix (\LF), Mac OS (\CR), and 106commonly found Unix (LF), Mac OS (CR), and DOS (CRLF)
109DOS (\CRLF) conventions. We would like to be able to 107conventions. We would like to be able to write code like the
110use the folowing code to normalize the end-of-line markers: 108following:
111\begin{quote} 109\begin{quote}
112\begin{lua} 110\begin{lua}
113@stick# 111@stick#
114local CRLF = "\013\010" 112local in = source.chain(source.file(io.stdin), normalize("\r\n"))
115local input = source.chain(source.file(io.stdin), normalize(CRLF)) 113local out = sink.file(io.stdout)
116local output = sink.file(io.stdout) 114pump.all(in, out)
117pump.all(input, output)
118% 115%
119\end{lua} 116\end{lua}
120\end{quote} 117\end{quote}
121 118
122This program should read data from the standard input stream 119This program should read data from the standard input stream
123and normalize the end-of-line markers to the canonic 120and normalize the end-of-line markers to the canonic CRLF
124\CRLF\ marker, as defined by the MIME standard. 121marker, as defined by the MIME standard. Finally, the
125Finally, the normalized text should be sent to the standard output 122normalized text should be sent to the standard output
126stream. We use a \emph{file source} that produces data from 123stream. We use a \emph{file source} that produces data from
127standard input, and chain it with a filter that normalizes 124standard input, and chain it with a filter that normalizes
128the data. The pump then repeatedly obtains data from the 125the data. The pump then repeatedly obtains data from the
@@ -130,28 +127,27 @@ source, and passes it to the \emph{file sink}, which sends
130it to the standard output. 127it to the standard output.
131 128
132In the code above, the \texttt{normalize} \emph{factory} is a 129In the code above, the \texttt{normalize} \emph{factory} is a
133function that creates our normalization filter, which 130function that creates our normalization filter. This filter
134replaces any end-of-line marker with the canonic marker. 131will replace any end-of-line marker with the canonic
135The initial filter interface is 132`\verb|\r\n|' marker. The initial filter interface is
136trivial: a filter function receives a chunk of input data, 133trivial: a filter function receives a chunk of input data,
137and returns a chunk of processed data. When there are no 134and returns a chunk of processed data. When there are no
138more input data left, the caller notifies the filter by invoking 135more input data left, the caller notifies the filter by invoking
139it with a \nil\ chunk. The filter responds by returning 136it with a \texttt{nil} chunk. The filter responds by returning
140the final chunk of processed data (which could of course be 137the final chunk of processed data.
141the empty string).
142 138
143Although the interface is extremely simple, the 139Although the interface is extremely simple, the
144implementation is not so obvious. A normalization filter 140implementation is not so obvious. A normalization filter
145respecting this interface needs to keep some kind of context 141respecting this interface needs to keep some kind of context
146between calls. This is because a chunk boundary may lie between 142between calls. This is because a chunk boundary may lie between
147the \CR\ and \LF\ characters marking the end of a single line. This 143the CR and LF characters marking the end of a line. This
148need for contextual storage motivates the use of 144need for contextual storage motivates the use of
149factories: each time the factory is invoked, it returns a 145factories: each time the factory is invoked, it returns a
150filter with its own context so that we can have several 146filter with its own context so that we can have several
151independent filters being used at the same time. For 147independent filters being used at the same time. For
152efficiency reasons, we must avoid the obvious solution of 148efficiency reasons, we must avoid the obvious solution of
153concatenating all the input into the context before 149concatenating all the input into the context before
154producing any output chunks. 150producing any output.
155 151
156To that end, we break the implementation into two parts: 152To that end, we break the implementation into two parts:
157a low-level filter, and a factory of high-level filters. The 153a low-level filter, and a factory of high-level filters. The
@@ -171,10 +167,10 @@ end-of-line normalization filters:
171\begin{quote} 167\begin{quote}
172\begin{lua} 168\begin{lua}
173@stick# 169@stick#
174function filter.cycle(lowlevel, context, extra) 170function filter.cycle(low, ctx, extra)
175 return function(chunk) 171 return function(chunk)
176 local ret 172 local ret
177 ret, context = lowlevel(context, chunk, extra) 173 ret, ctx = low(ctx, chunk, extra)
178 return ret 174 return ret
179 end 175 end
180end 176end
@@ -182,30 +178,27 @@ end
182 178
183@stick# 179@stick#
184function normalize(marker) 180function normalize(marker)
185 return filter.cycle(eol, 0, marker) 181 return cycle(eol, 0, marker)
186end 182end
187% 183%
188\end{lua} 184\end{lua}
189\end{quote} 185\end{quote}
190 186
191The \texttt{normalize} factory simply calls a more generic 187The \texttt{normalize} factory simply calls a more generic
192factory, the \texttt{cycle}~factory, passing the low-level 188factory, the \texttt{cycle} factory. This factory receives a
193filter~\texttt{eol}. The \texttt{cycle}~factory receives a
194low-level filter, an initial context, and an extra 189low-level filter, an initial context, and an extra
195parameter, and returns a new high-level filter. Each time 190parameter, and returns a new high-level filter. Each time
196the high-level filer is passed a new chunk, it invokes the 191the high-level filer is passed a new chunk, it invokes the
197low-level filter with the previous context, the new chunk, 192low-level filter with the previous context, the new chunk,
198and the extra argument. It is the low-level filter that 193and the extra argument. It is the low-level filter that
199does all the work, producing the chunk of processed data and 194does all the work, producing the chunk of processed data and
200a new context. The high-level filter then replaces its 195a new context. The high-level filter then updates its
201internal context, and returns the processed chunk of data to 196internal context, and returns the processed chunk of data to
202the user. Notice that we take advantage of Lua's lexical 197the user. Notice that we take advantage of Lua's lexical
203scoping to store the context in a closure between function 198scoping to store the context in a closure between function
204calls. 199calls.
205 200
206\subsection{The C part of the filter} 201Concerning the low-level filter code, we must first accept
207
208As for the low-level filter, we must first accept
209that there is no perfect solution to the end-of-line marker 202that there is no perfect solution to the end-of-line marker
210normalization problem. The difficulty comes from an 203normalization problem. The difficulty comes from an
211inherent ambiguity in the definition of empty lines within 204inherent ambiguity in the definition of empty lines within
@@ -215,39 +208,39 @@ mixed input. It also does a reasonable job with empty lines
215and serves as a good example of how to implement a low-level 208and serves as a good example of how to implement a low-level
216filter. 209filter.
217 210
218The idea is to consider both \CR\ and~\LF\ as end-of-line 211The idea is to consider both CR and~LF as end-of-line
219\emph{candidates}. We issue a single break if any candidate 212\emph{candidates}. We issue a single break if any candidate
220is seen alone, or if it is followed by a different 213is seen alone, or followed by a different candidate. In
221candidate. In other words, \CR~\CR~and \LF~\LF\ each issue 214other words, CR~CR~and LF~LF each issue two end-of-line
222two end-of-line markers, whereas \CR~\LF~and \LF~\CR\ issue 215markers, whereas CR~LF~and LF~CR issue only one marker each.
223only one marker each. It is easy to see that this method 216This method correctly handles the Unix, DOS/MIME, VMS, and Mac
224correctly handles the most common end-of-line conventions. 217OS conventions.
225 218
226With this in mind, we divide the low-level filter into two 219\subsection{The C part of the filter}
227simple functions. The inner function~\texttt{pushchar} performs the 220
228normalization itself. It takes each input character in turn, 221Our low-level filter is divided into two simple functions.
229deciding what to output and how to modify the context. The 222The inner function performs the normalization itself. It takes
230context tells if the last processed character was an 223each input character in turn, deciding what to output and
231end-of-line candidate, and if so, which candidate it was. 224how to modify the context. The context tells if the last
232For efficiency, we use Lua's auxiliary library's buffer 225processed character was an end-of-line candidate, and if so,
233interface: 226which candidate it was. For efficiency, it uses
227Lua's auxiliary library's buffer interface:
234\begin{quote} 228\begin{quote}
235\begin{C} 229\begin{C}
236@stick# 230@stick#
237@#define candidate(c) (c == CR || c == LF) 231@#define candidate(c) (c == CR || c == LF)
238static int pushchar(int c, int last, const char *marker, 232static int process(int c, int last, const char *marker,
239 luaL_Buffer *buffer) { 233 luaL_Buffer *buffer) {
240 if (candidate(c)) { 234 if (candidate(c)) {
241 if (candidate(last)) { 235 if (candidate(last)) {
242 if (c == last) 236 if (c == last) luaL_addstring(buffer, marker);
243 luaL_addstring(buffer, marker);
244 return 0; 237 return 0;
245 } else { 238 } else {
246 luaL_addstring(buffer, marker); 239 luaL_addstring(buffer, marker);
247 return c; 240 return c;
248 } 241 }
249 } else { 242 } else {
250 luaL_pushchar(buffer, c); 243 luaL_putchar(buffer, c);
251 return 0; 244 return 0;
252 } 245 }
253} 246}
@@ -255,20 +248,15 @@ static int pushchar(int c, int last, const char *marker,
255\end{C} 248\end{C}
256\end{quote} 249\end{quote}
257 250
258The outer function~\texttt{eol} simply interfaces with Lua. 251The outer function simply interfaces with Lua. It receives the
259It receives the context and input chunk (as well as an 252context and input chunk (as well as an optional
260optional custom end-of-line marker), and returns the 253custom end-of-line marker), and returns the transformed
261transformed output chunk and the new context. 254output chunk and the new context:
262Notice that if the input chunk is \nil, the operation
263is considered to be finished. In that case, the loop will
264not execute a single time and the context is reset to the
265initial state. This allows the filter to be reused many
266times:
267\begin{quote} 255\begin{quote}
268\begin{C} 256\begin{C}
269@stick# 257@stick#
270static int eol(lua_State *L) { 258static int eol(lua_State *L) {
271 int context = luaL_checkint(L, 1); 259 int ctx = luaL_checkint(L, 1);
272 size_t isize = 0; 260 size_t isize = 0;
273 const char *input = luaL_optlstring(L, 2, NULL, &isize); 261 const char *input = luaL_optlstring(L, 2, NULL, &isize);
274 const char *last = input + isize; 262 const char *last = input + isize;
@@ -281,18 +269,24 @@ static int eol(lua_State *L) {
281 return 2; 269 return 2;
282 } 270 }
283 while (input < last) 271 while (input < last)
284 context = pushchar(*input++, context, marker, &buffer); 272 ctx = process(*input++, ctx, marker, &buffer);
285 luaL_pushresult(&buffer); 273 luaL_pushresult(&buffer);
286 lua_pushnumber(L, context); 274 lua_pushnumber(L, ctx);
287 return 2; 275 return 2;
288} 276}
289% 277%
290\end{C} 278\end{C}
291\end{quote} 279\end{quote}
292 280
281Notice that if the input chunk is \texttt{nil}, the operation
282is considered to be finished. In that case, the loop will
283not execute a single time and the context is reset to the
284initial state. This allows the filter to be reused many
285times.
286
293When designing your own filters, the challenging part is to 287When designing your own filters, the challenging part is to
294decide what will be in the context. For line breaking, for 288decide what will be in the context. For line breaking, for
295instance, it could be the number of bytes that still fit in the 289instance, it could be the number of bytes left in the
296current line. For Base64 encoding, it could be a string 290current line. For Base64 encoding, it could be a string
297with the bytes that remain after the division of the input 291with the bytes that remain after the division of the input
298into 3-byte atoms. The MIME module in the \texttt{LuaSocket} 292into 3-byte atoms. The MIME module in the \texttt{LuaSocket}
@@ -300,22 +294,19 @@ distribution has many other examples.
300 294
301\section{Filter chains} 295\section{Filter chains}
302 296
303Chains greatly increase the power of filters. For example, 297Chains add a lot to the power of filters. For example,
304according to the standard for Quoted-Printable encoding, 298according to the standard for Quoted-Printable encoding,
305text should be normalized to a canonic end-of-line marker 299text must be normalized to a canonic end-of-line marker
306prior to encoding. After encoding, the resulting text must 300prior to encoding. To help specifying complex
307be broken into lines of no more than 76 characters, with the 301transformations like this, we define a chain factory that
308use of soft line breaks (a line terminated by the \texttt{=} 302creates a composite filter from one or more filters. A
309sign). To help specifying complex transformations like 303chained filter passes data through all its components, and
310this, we define a chain factory that creates a composite 304can be used wherever a primitive filter is accepted.
311filter from one or more filters. A chained filter passes
312data through all its components, and can be used wherever a
313primitive filter is accepted.
314 305
315The chaining factory is very simple. The auxiliary 306The chaining factory is very simple. The auxiliary
316function~\texttt{chainpair} chains two filters together, 307function~\texttt{chainpair} chains two filters together,
317taking special care if the chunk is the last. This is 308taking special care if the chunk is the last. This is
318because the final \nil\ chunk notification has to be 309because the final \texttt{nil} chunk notification has to be
319pushed through both filters in turn: 310pushed through both filters in turn:
320\begin{quote} 311\begin{quote}
321\begin{lua} 312\begin{lua}
@@ -331,9 +322,9 @@ end
331 322
332@stick# 323@stick#
333function filter.chain(...) 324function filter.chain(...)
334 local f = select(1, ...) 325 local f = arg[1]
335 for i = 2, select('@#', ...) do 326 for i = 2, @#arg do
336 f = chainpair(f, select(i, ...)) 327 f = chainpair(f, arg[i])
337 end 328 end
338 return f 329 return f
339end 330end
@@ -346,11 +337,11 @@ define the Quoted-Printable conversion as such:
346\begin{quote} 337\begin{quote}
347\begin{lua} 338\begin{lua}
348@stick# 339@stick#
349local qp = filter.chain(normalize(CRLF), encode("quoted-printable"), 340local qp = filter.chain(normalize("\r\n"),
350 wrap("quoted-printable")) 341 encode("quoted-printable"))
351local input = source.chain(source.file(io.stdin), qp) 342local in = source.chain(source.file(io.stdin), qp)
352local output = sink.file(io.stdout) 343local out = sink.file(io.stdout)
353pump.all(input, output) 344pump.all(in, out)
354% 345%
355\end{lua} 346\end{lua}
356\end{quote} 347\end{quote}
@@ -369,14 +360,14 @@ gives a final destination to the data.
369\subsection{Sources} 360\subsection{Sources}
370 361
371A source returns the next chunk of data each time it is 362A source returns the next chunk of data each time it is
372invoked. When there is no more data, it simply returns~\nil. 363invoked. When there is no more data, it simply returns
373In the event of an error, the source can inform the 364\texttt{nil}. In the event of an error, the source can inform the
374caller by returning \nil\ followed by the error message. 365caller by returning \texttt{nil} followed by an error message.
375 366
376Below are two simple source factories. The \texttt{empty} source 367Below are two simple source factories. The \texttt{empty} source
377returns no data, possibly returning an associated error 368returns no data, possibly returning an associated error
378message. The \texttt{file} source yields the contents of a file 369message. The \texttt{file} source works harder, and
379in a chunk by chunk fashion: 370yields the contents of a file in a chunk by chunk fashion:
380\begin{quote} 371\begin{quote}
381\begin{lua} 372\begin{lua}
382@stick# 373@stick#
@@ -407,7 +398,7 @@ A filtered source passes its data through the
407associated filter before returning it to the caller. 398associated filter before returning it to the caller.
408Filtered sources are useful when working with 399Filtered sources are useful when working with
409functions that get their input data from a source (such as 400functions that get their input data from a source (such as
410the pumps in our examples). By chaining a source with one or 401the pump in our first example). By chaining a source with one or
411more filters, the function can be transparently provided 402more filters, the function can be transparently provided
412with filtered data, with no need to change its interface. 403with filtered data, with no need to change its interface.
413Here is a factory that does the job: 404Here is a factory that does the job:
@@ -415,18 +406,14 @@ Here is a factory that does the job:
415\begin{lua} 406\begin{lua}
416@stick# 407@stick#
417function source.chain(src, f) 408function source.chain(src, f)
418 return function() 409 return source.simplify(function()
419 if not src then 410 if not src then return nil end
420 return nil
421 end
422 local chunk, err = src() 411 local chunk, err = src()
423 if not chunk then 412 if not chunk then
424 src = nil 413 src = nil
425 return f(nil) 414 return f(nil)
426 else 415 else return f(chunk) end
427 return f(chunk) 416 end)
428 end
429 end
430end 417end
431% 418%
432\end{lua} 419\end{lua}
@@ -434,20 +421,20 @@ end
434 421
435\subsection{Sinks} 422\subsection{Sinks}
436 423
437Just as we defined an interface for source of data, 424Just as we defined an interface a data source,
438we can also define an interface for a data destination. 425we can also define an interface for a data destination.
439We call any function respecting this 426We call any function respecting this
440interface a \emph{sink}. In our first example, we used a 427interface a \emph{sink}. In our first example, we used a
441file sink connected to the standard output. 428file sink connected to the standard output.
442 429
443Sinks receive consecutive chunks of data, until the end of 430Sinks receive consecutive chunks of data, until the end of
444data is signaled by a \nil\ input chunk. A sink can be 431data is signaled by a \texttt{nil} chunk. A sink can be
445notified of an error with an optional extra argument that 432notified of an error with an optional extra argument that
446contains the error message, following a \nil\ chunk. 433contains the error message, following a \texttt{nil} chunk.
447If a sink detects an error itself, and 434If a sink detects an error itself, and
448wishes not to be called again, it can return \nil, 435wishes not to be called again, it can return \texttt{nil},
449followed by an error message. A return value that 436followed by an error message. A return value that
450is not \nil\ means the sink will accept more data. 437is not \texttt{nil} means the source will accept more data.
451 438
452Below are two useful sink factories. 439Below are two useful sink factories.
453The table factory creates a sink that stores 440The table factory creates a sink that stores
@@ -482,7 +469,7 @@ end
482 469
483Naturally, filtered sinks are just as useful as filtered 470Naturally, filtered sinks are just as useful as filtered
484sources. A filtered sink passes each chunk it receives 471sources. A filtered sink passes each chunk it receives
485through the associated filter before handing it down to the 472through the associated filter before handing it to the
486original sink. In the following example, we use a source 473original sink. In the following example, we use a source
487that reads from the standard input. The input chunks are 474that reads from the standard input. The input chunks are
488sent to a table sink, which has been coupled with a 475sent to a table sink, which has been coupled with a
@@ -492,10 +479,10 @@ standard out:
492\begin{quote} 479\begin{quote}
493\begin{lua} 480\begin{lua}
494@stick# 481@stick#
495local input = source.file(io.stdin) 482local in = source.file(io.stdin)
496local output, t = sink.table() 483local out, t = sink.table()
497output = sink.chain(normalize(CRLF), output) 484out = sink.chain(normalize("\r\n"), out)
498pump.all(input, output) 485pump.all(in, out)
499io.write(table.concat(t)) 486io.write(table.concat(t))
500% 487%
501\end{lua} 488\end{lua}
@@ -503,11 +490,11 @@ io.write(table.concat(t))
503 490
504\subsection{Pumps} 491\subsection{Pumps}
505 492
506Although not on purpose, our interface for sources is 493Adrian Sietsma noticed that, although not on purpose, our
507compatible with Lua iterators. That is, a source can be 494interface for sources is compatible with Lua iterators.
508neatly used in conjunction with \texttt{for} loops. Using 495That is, a source can be neatly used in conjunction
509our file source as an iterator, we can write the following 496with \texttt{for} loops. Using our file
510code: 497source as an iterator, we can write the following code:
511\begin{quote} 498\begin{quote}
512\begin{lua} 499\begin{lua}
513@stick# 500@stick#
@@ -552,22 +539,20 @@ end
552The \texttt{pump.step} function moves one chunk of data from 539The \texttt{pump.step} function moves one chunk of data from
553the source to the sink. The \texttt{pump.all} function takes 540the source to the sink. The \texttt{pump.all} function takes
554an optional \texttt{step} function and uses it to pump all the 541an optional \texttt{step} function and uses it to pump all the
555data from the source to the sink. 542data from the source to the sink. We can now use everything
556Here is an example that uses the Base64 and the 543we have to write a program that reads a binary file from
557line wrapping filters from the \texttt{LuaSocket}
558distribution. The program reads a binary file from
559disk and stores it in another file, after encoding it to the 544disk and stores it in another file, after encoding it to the
560Base64 transfer content encoding: 545Base64 transfer content encoding:
561\begin{quote} 546\begin{quote}
562\begin{lua} 547\begin{lua}
563@stick# 548@stick#
564local input = source.chain( 549local in = source.chain(
565 source.file(io.open("input.bin", "rb")), 550 source.file(io.open("input.bin", "rb")),
566 encode("base64")) 551 encode("base64"))
567local output = sink.chain( 552local out = sink.chain(
568 wrap(76), 553 wrap(76),
569 sink.file(io.open("output.b64", "w"))) 554 sink.file(io.open("output.b64", "w")))
570pump.all(input, output) 555pump.all(in, out)
571% 556%
572\end{lua} 557\end{lua}
573\end{quote} 558\end{quote}
@@ -576,17 +561,19 @@ The way we split the filters here is not intuitive, on
576purpose. Alternatively, we could have chained the Base64 561purpose. Alternatively, we could have chained the Base64
577encode filter and the line-wrap filter together, and then 562encode filter and the line-wrap filter together, and then
578chain the resulting filter with either the file source or 563chain the resulting filter with either the file source or
579the file sink. It doesn't really matter. 564the file sink. It doesn't really matter. The Base64 and the
565line wrapping filters are part of the \texttt{LuaSocket}
566distribution.
580 567
581\section{Exploding filters} 568\section{Exploding filters}
582 569
583Our current filter interface has one serious shortcoming. 570Our current filter interface has one flagrant shortcoming.
584Consider for example a \texttt{gzip} decompression filter. 571When David Burgess was writing his \texttt{gzip} filter, he
585During decompression, a small input chunk can be exploded 572noticed that a decompression filter can explode a small
586into a huge amount of data. To address this problem, we 573input chunk into a huge amount of data. To address this
587decided to change the filter interface and allow exploding 574problem, we decided to change the filter interface and allow
588filters to return large quantities of output data in a chunk 575exploding filters to return large quantities of output data
589by chunk manner. 576in a chunk by chunk manner.
590 577
591More specifically, after passing each chunk of input to 578More specifically, after passing each chunk of input to
592a filter, and collecting the first chunk of output, the 579a filter, and collecting the first chunk of output, the
@@ -595,11 +582,11 @@ filtered data is left. Within these secondary calls, the
595caller passes an empty string to the filter. The filter 582caller passes an empty string to the filter. The filter
596responds with an empty string when it is ready for the next 583responds with an empty string when it is ready for the next
597input chunk. In the end, after the user passes a 584input chunk. In the end, after the user passes a
598\nil\ chunk notifying the filter that there is no 585\texttt{nil} chunk notifying the filter that there is no
599more input data, the filter might still have to produce too 586more input data, the filter might still have to produce too
600much output data to return in a single chunk. The user has 587much output data to return in a single chunk. The user has
601to loop again, now passing \nil\ to the filter each time, 588to loop again, now passing \texttt{nil} to the filter each time,
602until the filter itself returns \nil\ to notify the 589until the filter itself returns \texttt{nil} to notify the
603user it is finally done. 590user it is finally done.
604 591
605Fortunately, it is very easy to modify a filter to respect 592Fortunately, it is very easy to modify a filter to respect
@@ -612,13 +599,13 @@ Interestingly, the modifications do not have a measurable
612negative impact in the performance of filters that do 599negative impact in the performance of filters that do
613not need the added flexibility. On the other hand, for a 600not need the added flexibility. On the other hand, for a
614small price in complexity, the changes make exploding 601small price in complexity, the changes make exploding
615filters practical. 602filters practical.
616 603
617\section{A complex example} 604\section{A complex example}
618 605
619The LTN12 module in the \texttt{LuaSocket} distribution 606The LTN12 module in the \texttt{LuaSocket} distribution
620implements all the ideas we have described. The MIME 607implements the ideas we have described. The MIME
621and SMTP modules are tightly integrated with LTN12, 608and SMTP modules are especially integrated with LTN12,
622and can be used to showcase the expressive power of filters, 609and can be used to showcase the expressive power of filters,
623sources, sinks, and pumps. Below is an example 610sources, sinks, and pumps. Below is an example
624of how a user would proceed to define and send a 611of how a user would proceed to define and send a
@@ -635,9 +622,9 @@ local message = smtp.message{
635 to = "Fulano <fulano@example.com>", 622 to = "Fulano <fulano@example.com>",
636 subject = "A message with an attachment"}, 623 subject = "A message with an attachment"},
637 body = { 624 body = {
638 preamble = "Hope you can see the attachment" .. CRLF, 625 preamble = "Hope you can see the attachment\r\n",
639 [1] = { 626 [1] = {
640 body = "Here is our logo" .. CRLF}, 627 body = "Here is our logo\r\n"},
641 [2] = { 628 [2] = {
642 headers = { 629 headers = {
643 ["content-type"] = 'image/png; name="luasocket.png"', 630 ["content-type"] = 'image/png; name="luasocket.png"',
@@ -678,18 +665,6 @@ abstraction for final data destinations. Filters define an
678interface for data transformations. The chaining of 665interface for data transformations. The chaining of
679filters, sources and sinks provides an elegant way to create 666filters, sources and sinks provides an elegant way to create
680arbitrarily complex data transformations from simpler 667arbitrarily complex data transformations from simpler
681components. Pumps simply push the data through. 668components. Pumps simply move the data through.
682
683\section{Acknowledgements}
684
685The concepts described in this text are the result of long
686discussions with David Burgess. A version of this text has
687been released on-line as the Lua Technical Note 012, hence
688the name of the corresponding LuaSocket module,
689\texttt{ltn12}. Wim Couwenberg contributed to the
690implementation of the module, and Adrian Sietsma was the
691first to notice the correspondence between sources and Lua
692iterators.
693
694 669
695\end{document} 670\end{document}
diff --git a/gem/makefile b/gem/makefile
index d2f0c93..a4287c2 100644
--- a/gem/makefile
+++ b/gem/makefile
@@ -12,12 +12,3 @@ clean:
12 12
13pdf: ltn012.pdf 13pdf: ltn012.pdf
14 open ltn012.pdf 14 open ltn012.pdf
15
16test: gem.so
17
18
19gem.o: gem.c
20 gcc -c -o gem.o -Wall -ansi -W -O2 gem.c
21
22gem.so: gem.o
23 export MACOSX_DEPLOYMENT_TARGET="10.3"; gcc -bundle -undefined dynamic_lookup -o gem.so gem.o