diff options
| author | Diego Nehab <diego@tecgraf.puc-rio.br> | 2007-05-31 22:27:40 +0000 |
|---|---|---|
| committer | Diego Nehab <diego@tecgraf.puc-rio.br> | 2007-05-31 22:27:40 +0000 |
| commit | 3074a8f56b5153f4477e662453102583d7b6f539 (patch) | |
| tree | 095eecdd7017e17115ab8387898d2f5e5f2f2323 | |
| parent | 7b195164b0c8755b15e8055f1d524282847f6e13 (diff) | |
| download | luasocket-3074a8f56b5153f4477e662453102583d7b6f539.tar.gz luasocket-3074a8f56b5153f4477e662453102583d7b6f539.tar.bz2 luasocket-3074a8f56b5153f4477e662453102583d7b6f539.zip | |
Before sending to Roberto.
| -rw-r--r-- | gem/ltn012.tex | 218 |
1 files changed, 105 insertions, 113 deletions
diff --git a/gem/ltn012.tex b/gem/ltn012.tex index 7dbc5ef..0f81b86 100644 --- a/gem/ltn012.tex +++ b/gem/ltn012.tex | |||
| @@ -23,19 +23,17 @@ received in consecutive function calls, returning partial | |||
| 23 | results after each invocation. Examples of operations that can be | 23 | results after each invocation. Examples of operations that can be |
| 24 | implemented as filters include the end-of-line normalization | 24 | implemented as filters include the end-of-line normalization |
| 25 | for text, Base64 and Quoted-Printable transfer content | 25 | for text, Base64 and Quoted-Printable transfer content |
| 26 | encodings, the breaking of text into lines, SMTP byte | 26 | encodings, the breaking of text into lines, SMTP dot-stuffing, |
| 27 | stuffing, and there are many others. Filters become even | 27 | and there are many others. Filters become even |
| 28 | more powerful when we allow them to be chained together to | 28 | more powerful when we allow them to be chained together to |
| 29 | create composite filters. In this context, filters can be seen | 29 | create composite filters. In this context, filters can be seen |
| 30 | as the middle links in a chain of data transformations. Sources an sinks | 30 | as the middle links in a chain of data transformations. Sources an sinks |
| 31 | are the corresponding end points of these chains. A source | 31 | are the corresponding end points of these chains. A source |
| 32 | is a function that produces data, chunk by chunk, and a sink | 32 | is a function that produces data, chunk by chunk, and a sink |
| 33 | is a function that takes data, chunk by chunk. In this | 33 | is a function that takes data, chunk by chunk. In this |
| 34 | chapter, we describe the design of an elegant interface for filters, | 34 | article, we describe the design of an elegant interface for filters, |
| 35 | sources, sinks and chaining, refine it | 35 | sources, sinks, and chaining, and illustrate each step |
| 36 | until it reaches a high degree of generality. We discuss | 36 | with concrete examples. |
| 37 | implementation challenges, provide practical solutions, | ||
| 38 | and illustrate each step with concrete examples. | ||
| 39 | \end{abstract} | 37 | \end{abstract} |
| 40 | 38 | ||
| 41 | 39 | ||
| @@ -52,7 +50,7 @@ transfer coding, and the list goes on. | |||
| 52 | Many complex tasks require a combination of two or more such | 50 | Many complex tasks require a combination of two or more such |
| 53 | transformations, and therefore a general mechanism for | 51 | transformations, and therefore a general mechanism for |
| 54 | promoting reuse is desirable. In the process of designing | 52 | promoting reuse is desirable. In the process of designing |
| 55 | LuaSocket 2.0, David Burgess and I were forced to deal with | 53 | \texttt{LuaSocket~2.0}, David Burgess and I were forced to deal with |
| 56 | this problem. The solution we reached proved to be very | 54 | this problem. The solution we reached proved to be very |
| 57 | general and convenient. It is based on the concepts of | 55 | general and convenient. It is based on the concepts of |
| 58 | filters, sources, sinks, and pumps, which we introduce | 56 | filters, sources, sinks, and pumps, which we introduce |
| @@ -62,18 +60,18 @@ below. | |||
| 62 | with chunks of input, successively returning processed | 60 | with chunks of input, successively returning processed |
| 63 | chunks of output. More importantly, the result of | 61 | chunks of output. More importantly, the result of |
| 64 | concatenating all the output chunks must be the same as the | 62 | concatenating all the output chunks must be the same as the |
| 65 | result of applying the filter over the concatenation of all | 63 | result of applying the filter to the concatenation of all |
| 66 | input chunks. In fancier language, filters \emph{commute} | 64 | input chunks. In fancier language, filters \emph{commute} |
| 67 | with the concatenation operator. As a result, chunk | 65 | with the concatenation operator. As a result, chunk |
| 68 | boundaries are irrelevant: filters correctly handle input | 66 | boundaries are irrelevant: filters correctly handle input |
| 69 | data no matter how it was originally split. | 67 | data no matter how it is split. |
| 70 | 68 | ||
| 71 | A \emph{chain} transparently combines the effect of one or | 69 | A \emph{chain} transparently combines the effect of one or |
| 72 | more filters. The interface of a chain must be | 70 | more filters. The interface of a chain is |
| 73 | indistinguishable from the interface of its components. | 71 | indistinguishable from the interface of its components. |
| 74 | This allows a chained filter to be used wherever an atomic | 72 | This allows a chained filter to be used wherever an atomic |
| 75 | filter is expected. In particular, chains can be chained | 73 | filter is expected. In particular, chains can be |
| 76 | themselves to create arbitrarily complex operations. | 74 | themselves chained to create arbitrarily complex operations. |
| 77 | 75 | ||
| 78 | Filters can be seen as internal nodes in a network through | 76 | Filters can be seen as internal nodes in a network through |
| 79 | which data will flow, potentially being transformed many | 77 | which data will flow, potentially being transformed many |
| @@ -93,15 +91,13 @@ anything to happen. \emph{Pumps} provide the driving force | |||
| 93 | that pushes data through the network, from a source to a | 91 | that pushes data through the network, from a source to a |
| 94 | sink. | 92 | sink. |
| 95 | 93 | ||
| 96 | These concepts will become less abstract with examples. In | 94 | In the following sections, we start with a simplified |
| 97 | the following sections, we start with a simplified | 95 | interface, which we later refine. The evolution we present |
| 98 | interface, which we refine several times until no obvious | 96 | is not contrived: it recreates the steps we followed |
| 99 | shortcomings remain. The evolution we present is not | 97 | ourselves as we consolidated our understanding of these |
| 100 | contrived: it recreates the steps we followed ourselves as | 98 | concepts within our application domain. |
| 101 | we consolidated our understanding of these concepts and the | ||
| 102 | applications that benefit from them. | ||
| 103 | 99 | ||
| 104 | \subsection{A concrete example} | 100 | \subsection{A simple example} |
| 105 | 101 | ||
| 106 | Let us use the end-of-line normalization of text as an | 102 | Let us use the end-of-line normalization of text as an |
| 107 | example to motivate our initial filter interface. | 103 | example to motivate our initial filter interface. |
| @@ -141,23 +137,23 @@ it with a \texttt{nil} chunk. The filter responds by returning | |||
| 141 | the final chunk of processed data. | 137 | the final chunk of processed data. |
| 142 | 138 | ||
| 143 | Although the interface is extremely simple, the | 139 | Although the interface is extremely simple, the |
| 144 | implementation is not so obvious. Any filter | 140 | implementation is not so obvious. A normalization filter |
| 145 | respecting this interface needs to keep some kind of context | 141 | respecting this interface needs to keep some kind of context |
| 146 | between calls. This is because chunks can for example be broken | 142 | between calls. This is because a chunk boundary may lie between |
| 147 | between the CR and LF characters marking the end of a line. This | 143 | the CR and LF characters marking the end of a line. This |
| 148 | need for contextual storage is what motivates the use of | 144 | need for contextual storage motivates the use of |
| 149 | factories: each time the factory is called, it returns a | 145 | factories: each time the factory is invoked, it returns a |
| 150 | filter with its own context so that we can have several | 146 | filter with its own context so that we can have several |
| 151 | independent filters being used at the same time. For | 147 | independent filters being used at the same time. For |
| 152 | efficiency reasons, we must avoid the obvious solution of | 148 | efficiency reasons, we must avoid the obvious solution of |
| 153 | concatenating all the input into the context before | 149 | concatenating all the input into the context before |
| 154 | producing any output. | 150 | producing any output. |
| 155 | 151 | ||
| 156 | To that end, we will break the implementation in two parts: | 152 | To that end, we break the implementation into two parts: |
| 157 | a low-level filter, and a factory of high-level filters. The | 153 | a low-level filter, and a factory of high-level filters. The |
| 158 | low-level filter will be implemented in C and will not carry | 154 | low-level filter is implemented in C and does not maintain |
| 159 | any context between function calls. The high-level filter | 155 | any context between function calls. The high-level filter |
| 160 | factory, implemented in Lua, will create and return a | 156 | factory, implemented in Lua, creates and returns a |
| 161 | high-level filter that maintains whatever context the low-level | 157 | high-level filter that maintains whatever context the low-level |
| 162 | filter needs, but isolates the user from its internal | 158 | filter needs, but isolates the user from its internal |
| 163 | details. That way, we take advantage of C's efficiency to | 159 | details. That way, we take advantage of C's efficiency to |
| @@ -191,22 +187,21 @@ end | |||
| 191 | The \texttt{normalize} factory simply calls a more generic | 187 | The \texttt{normalize} factory simply calls a more generic |
| 192 | factory, the \texttt{cycle} factory. This factory receives a | 188 | factory, the \texttt{cycle} factory. This factory receives a |
| 193 | low-level filter, an initial context, and an extra | 189 | low-level filter, an initial context, and an extra |
| 194 | parameter, and returns the corresponding high-level filter. | 190 | parameter, and returns a new high-level filter. Each time |
| 195 | Each time the high-level filer is passed a new chunk, it | 191 | the high-level filer is passed a new chunk, it invokes the |
| 196 | invokes the low-level filter passing it the previous | 192 | low-level filter with the previous context, the new chunk, |
| 197 | context, the new chunk, and the extra argument. The | 193 | and the extra argument. It is the low-level filter that |
| 198 | low-level filter in turn produces the chunk of processed | 194 | does all the work, producing the chunk of processed data and |
| 199 | data and a new context. The high-level filter then updates | 195 | a new context. The high-level filter then updates its |
| 200 | its internal context, and returns the processed chunk of | 196 | internal context, and returns the processed chunk of data to |
| 201 | data to the user. It is the low-level filter that does all | 197 | the user. Notice that we take advantage of Lua's lexical |
| 202 | the work. Notice that we take advantage of Lua's lexical | ||
| 203 | scoping to store the context in a closure between function | 198 | scoping to store the context in a closure between function |
| 204 | calls. | 199 | calls. |
| 205 | 200 | ||
| 206 | Concerning the low-level filter code, we must first accept | 201 | Concerning the low-level filter code, we must first accept |
| 207 | that there is no perfect solution to the end-of-line marker | 202 | that there is no perfect solution to the end-of-line marker |
| 208 | normalization problem itself. The difficulty comes from an | 203 | normalization problem. The difficulty comes from an |
| 209 | inherent ambiguity on the definition of empty lines within | 204 | inherent ambiguity in the definition of empty lines within |
| 210 | mixed input. However, the following solution works well for | 205 | mixed input. However, the following solution works well for |
| 211 | any consistent input, as well as for non-empty lines in | 206 | any consistent input, as well as for non-empty lines in |
| 212 | mixed input. It also does a reasonable job with empty lines | 207 | mixed input. It also does a reasonable job with empty lines |
| @@ -218,17 +213,18 @@ The idea is to consider both CR and~LF as end-of-line | |||
| 218 | is seen alone, or followed by a different candidate. In | 213 | is seen alone, or followed by a different candidate. In |
| 219 | other words, CR~CR~and LF~LF each issue two end-of-line | 214 | other words, CR~CR~and LF~LF each issue two end-of-line |
| 220 | markers, whereas CR~LF~and LF~CR issue only one marker each. | 215 | markers, whereas CR~LF~and LF~CR issue only one marker each. |
| 221 | This idea correctly handles the Unix, DOS/MIME, VMS, and Mac | 216 | This method correctly handles the Unix, DOS/MIME, VMS, and Mac |
| 222 | OS, as well as other more obscure conventions. | 217 | OS conventions. |
| 223 | 218 | ||
| 224 | \subsection{The C part of the filter} | 219 | \subsection{The C part of the filter} |
| 225 | 220 | ||
| 226 | Our low-level filter is divided into two simple functions. | 221 | Our low-level filter is divided into two simple functions. |
| 227 | The inner function actually does the conversion. It takes | 222 | The inner function performs the normalization itself. It takes |
| 228 | each input character in turn, deciding what to output and | 223 | each input character in turn, deciding what to output and |
| 229 | how to modify the context. The context tells if the last | 224 | how to modify the context. The context tells if the last |
| 230 | character processed was an end-of-line candidate, and if so, | 225 | processed character was an end-of-line candidate, and if so, |
| 231 | which candidate it was. | 226 | which candidate it was. For efficiency, it uses |
| 227 | Lua's auxiliary library's buffer interface: | ||
| 232 | \begin{quote} | 228 | \begin{quote} |
| 233 | \begin{C} | 229 | \begin{C} |
| 234 | @stick# | 230 | @stick# |
| @@ -252,12 +248,10 @@ static int process(int c, int last, const char *marker, | |||
| 252 | \end{C} | 248 | \end{C} |
| 253 | \end{quote} | 249 | \end{quote} |
| 254 | 250 | ||
| 255 | The inner function makes use of Lua's auxiliary library's | 251 | The outer function simply interfaces with Lua. It receives the |
| 256 | buffer interface for efficiency. The | 252 | context and input chunk (as well as an optional |
| 257 | outer function simply interfaces with Lua. It receives the | ||
| 258 | context and the input chunk (as well as an optional | ||
| 259 | custom end-of-line marker), and returns the transformed | 253 | custom end-of-line marker), and returns the transformed |
| 260 | output chunk and the new context. | 254 | output chunk and the new context: |
| 261 | \begin{quote} | 255 | \begin{quote} |
| 262 | \begin{C} | 256 | \begin{C} |
| 263 | @stick# | 257 | @stick# |
| @@ -291,33 +285,29 @@ initial state. This allows the filter to be reused many | |||
| 291 | times. | 285 | times. |
| 292 | 286 | ||
| 293 | When designing your own filters, the challenging part is to | 287 | When designing your own filters, the challenging part is to |
| 294 | decide what will be the context. For line breaking, for | 288 | decide what will be in the context. For line breaking, for |
| 295 | instance, it could be the number of bytes left in the | 289 | instance, it could be the number of bytes left in the |
| 296 | current line. For Base64 encoding, it could be a string | 290 | current line. For Base64 encoding, it could be a string |
| 297 | with the bytes that remain after the division of the input | 291 | with the bytes that remain after the division of the input |
| 298 | into 3-byte atoms. The MIME module in the LuaSocket | 292 | into 3-byte atoms. The MIME module in the \texttt{LuaSocket} |
| 299 | distribution has many other examples. | 293 | distribution has many other examples. |
| 300 | 294 | ||
| 301 | \section{Filter chains} | 295 | \section{Filter chains} |
| 302 | 296 | ||
| 303 | Chains add a lot to the power of filters. For example, | 297 | Chains add a lot to the power of filters. For example, |
| 304 | according to the standard for Quoted-Printable encoding, the | 298 | according to the standard for Quoted-Printable encoding, |
| 305 | text must be normalized into its canonic form prior to | 299 | text must be normalized to a canonic end-of-line marker |
| 306 | encoding, as far as end-of-line markers are concerned. To | 300 | prior to encoding. To help specifying complex |
| 307 | help specifying complex transformations like these, we define a | 301 | transformations like this, we define a chain factory that |
| 308 | chain factory that creates a composite filter from one or | 302 | creates a composite filter from one or more filters. A |
| 309 | more filters. A chained filter passes data through all | 303 | chained filter passes data through all its components, and |
| 310 | its components, and can be used wherever a primitive filter | 304 | can be used wherever a primitive filter is accepted. |
| 311 | is accepted. | 305 | |
| 312 | 306 | The chaining factory is very simple. The auxiliary | |
| 313 | The chaining factory is very simple. All it does is return a | 307 | function~\texttt{chainpair} chains two filters together, |
| 314 | function that passes data through all filters and returns | 308 | taking special care if the chunk is the last. This is |
| 315 | the result to the user. The auxiliary | 309 | because the final \texttt{nil} chunk notification has to be |
| 316 | function~\texttt{chainpair} can only chain two filters | 310 | pushed through both filters in turn: |
| 317 | together. In the auxiliary function, special care must be | ||
| 318 | taken if the chunk is the last. This is because the final | ||
| 319 | \texttt{nil} chunk notification has to be pushed through both | ||
| 320 | filters in turn: | ||
| 321 | \begin{quote} | 311 | \begin{quote} |
| 322 | \begin{lua} | 312 | \begin{lua} |
| 323 | @stick# | 313 | @stick# |
| @@ -333,7 +323,7 @@ end | |||
| 333 | @stick# | 323 | @stick# |
| 334 | function filter.chain(...) | 324 | function filter.chain(...) |
| 335 | local f = arg[1] | 325 | local f = arg[1] |
| 336 | for i = 2, table.getn(arg) do | 326 | for i = 2, @#arg do |
| 337 | f = chainpair(f, arg[i]) | 327 | f = chainpair(f, arg[i]) |
| 338 | end | 328 | end |
| 339 | return f | 329 | return f |
| @@ -343,7 +333,7 @@ end | |||
| 343 | \end{quote} | 333 | \end{quote} |
| 344 | 334 | ||
| 345 | Thanks to the chain factory, we can | 335 | Thanks to the chain factory, we can |
| 346 | trivially define the Quoted-Printable conversion: | 336 | define the Quoted-Printable conversion as such: |
| 347 | \begin{quote} | 337 | \begin{quote} |
| 348 | \begin{lua} | 338 | \begin{lua} |
| 349 | @stick# | 339 | @stick# |
| @@ -361,7 +351,7 @@ pump.all(in, out) | |||
| 361 | The filters we introduced so far act as the internal nodes | 351 | The filters we introduced so far act as the internal nodes |
| 362 | in a network of transformations. Information flows from node | 352 | in a network of transformations. Information flows from node |
| 363 | to node (or rather from one filter to the next) and is | 353 | to node (or rather from one filter to the next) and is |
| 364 | transformed on its way out. Chaining filters together is our | 354 | transformed along the way. Chaining filters together is our |
| 365 | way to connect nodes in this network. As the starting point | 355 | way to connect nodes in this network. As the starting point |
| 366 | for the network, we need a source node that produces the | 356 | for the network, we need a source node that produces the |
| 367 | data. In the end of the network, we need a sink node that | 357 | data. In the end of the network, we need a sink node that |
| @@ -376,8 +366,8 @@ caller by returning \texttt{nil} followed by an error message. | |||
| 376 | 366 | ||
| 377 | Below are two simple source factories. The \texttt{empty} source | 367 | Below are two simple source factories. The \texttt{empty} source |
| 378 | returns no data, possibly returning an associated error | 368 | returns no data, possibly returning an associated error |
| 379 | message. The \texttt{file} source is more usefule, and | 369 | message. The \texttt{file} source works harder, and |
| 380 | yields the contents of a file in a chunk by chunk fashion. | 370 | yields the contents of a file in a chunk by chunk fashion: |
| 381 | \begin{quote} | 371 | \begin{quote} |
| 382 | \begin{lua} | 372 | \begin{lua} |
| 383 | @stick# | 373 | @stick# |
| @@ -404,9 +394,13 @@ end | |||
| 404 | 394 | ||
| 405 | \subsection{Filtered sources} | 395 | \subsection{Filtered sources} |
| 406 | 396 | ||
| 407 | It is often useful to chain a source with a filter. A | 397 | A filtered source passes its data through the |
| 408 | filtered source passes its data through the | ||
| 409 | associated filter before returning it to the caller. | 398 | associated filter before returning it to the caller. |
| 399 | Filtered sources are useful when working with | ||
| 400 | functions that get their input data from a source (such as | ||
| 401 | the pump in our first example). By chaining a source with one or | ||
| 402 | more filters, the function can be transparently provided | ||
| 403 | with filtered data, with no need to change its interface. | ||
| 410 | Here is a factory that does the job: | 404 | Here is a factory that does the job: |
| 411 | \begin{quote} | 405 | \begin{quote} |
| 412 | \begin{lua} | 406 | \begin{lua} |
| @@ -425,23 +419,16 @@ end | |||
| 425 | \end{lua} | 419 | \end{lua} |
| 426 | \end{quote} | 420 | \end{quote} |
| 427 | 421 | ||
| 428 | Our motivating example in the introduction chains a source | ||
| 429 | with a filter. Filtered sources are useful when working with | ||
| 430 | functions that get their input data from a source (such as | ||
| 431 | the pump in the example). By chaining a source with one or | ||
| 432 | more filters, the function can be transparently provided | ||
| 433 | with filtered data, with no need to change its interface. | ||
| 434 | |||
| 435 | \subsection{Sinks} | 422 | \subsection{Sinks} |
| 436 | 423 | ||
| 437 | Just as we defined an interface for sources of | 424 | Just as we defined an interface a data source, |
| 438 | data, we can also define an interface for a | 425 | we can also define an interface for a data destination. |
| 439 | destination for data. We call any function respecting this | 426 | We call any function respecting this |
| 440 | interface a \emph{sink}. In our first example, we used a | 427 | interface a \emph{sink}. In our first example, we used a |
| 441 | file sink connected to the standard output. | 428 | file sink connected to the standard output. |
| 442 | 429 | ||
| 443 | Sinks receive consecutive chunks of data, until the end of | 430 | Sinks receive consecutive chunks of data, until the end of |
| 444 | data is notified with a \texttt{nil} chunk. A sink can be | 431 | data is signaled by a \texttt{nil} chunk. A sink can be |
| 445 | notified of an error with an optional extra argument that | 432 | notified of an error with an optional extra argument that |
| 446 | contains the error message, following a \texttt{nil} chunk. | 433 | contains the error message, following a \texttt{nil} chunk. |
| 447 | If a sink detects an error itself, and | 434 | If a sink detects an error itself, and |
| @@ -529,18 +516,21 @@ common that it deserves its own function: | |||
| 529 | function pump.step(src, snk) | 516 | function pump.step(src, snk) |
| 530 | local chunk, src_err = src() | 517 | local chunk, src_err = src() |
| 531 | local ret, snk_err = snk(chunk, src_err) | 518 | local ret, snk_err = snk(chunk, src_err) |
| 532 | return chunk and ret and not src_err and not snk_err, | 519 | if chunk and ret then return 1 |
| 533 | src_err or snk_err | 520 | else return nil, src_err or snk_err end |
| 534 | end | 521 | end |
| 535 | % | 522 | % |
| 536 | 523 | ||
| 537 | @stick# | 524 | @stick# |
| 538 | function pump.all(src, snk, step) | 525 | function pump.all(src, snk, step) |
| 539 | step = step or pump.step | 526 | step = step or pump.step |
| 540 | while true do | 527 | while true do |
| 541 | local ret, err = step(src, snk) | 528 | local ret, err = step(src, snk) |
| 542 | if not ret then return not err, err end | 529 | if not ret then |
| 543 | end | 530 | if err then return nil, err |
| 531 | else return 1 end | ||
| 532 | end | ||
| 533 | end | ||
| 544 | end | 534 | end |
| 545 | % | 535 | % |
| 546 | \end{lua} | 536 | \end{lua} |
| @@ -571,21 +561,23 @@ The way we split the filters here is not intuitive, on | |||
| 571 | purpose. Alternatively, we could have chained the Base64 | 561 | purpose. Alternatively, we could have chained the Base64 |
| 572 | encode filter and the line-wrap filter together, and then | 562 | encode filter and the line-wrap filter together, and then |
| 573 | chain the resulting filter with either the file source or | 563 | chain the resulting filter with either the file source or |
| 574 | the file sink. It doesn't really matter. | 564 | the file sink. It doesn't really matter. The Base64 and the |
| 565 | line wrapping filters are part of the \texttt{LuaSocket} | ||
| 566 | distribution. | ||
| 575 | 567 | ||
| 576 | \section{Exploding filters} | 568 | \section{Exploding filters} |
| 577 | 569 | ||
| 578 | Our current filter interface has one flagrant shortcoming. | 570 | Our current filter interface has one flagrant shortcoming. |
| 579 | When David Burgess was writing his \texttt{gzip} filter, he | 571 | When David Burgess was writing his \texttt{gzip} filter, he |
| 580 | noticed that a decompression filter can explode a small | 572 | noticed that a decompression filter can explode a small |
| 581 | input chunk into a huge amount of data. To address this, we | 573 | input chunk into a huge amount of data. To address this |
| 582 | decided to change our filter interface to allow exploding | 574 | problem, we decided to change the filter interface and allow |
| 583 | filters to return large quantities of output data in a chunk | 575 | exploding filters to return large quantities of output data |
| 584 | by chunk manner. | 576 | in a chunk by chunk manner. |
| 585 | 577 | ||
| 586 | More specifically, after passing each chunk of input data to | 578 | More specifically, after passing each chunk of input to |
| 587 | a filter and collecting the first chunk of output data, the | 579 | a filter, and collecting the first chunk of output, the |
| 588 | user must now loop to receive data from the filter until no | 580 | user must now loop to receive other chunks from the filter until no |
| 589 | filtered data is left. Within these secondary calls, the | 581 | filtered data is left. Within these secondary calls, the |
| 590 | caller passes an empty string to the filter. The filter | 582 | caller passes an empty string to the filter. The filter |
| 591 | responds with an empty string when it is ready for the next | 583 | responds with an empty string when it is ready for the next |
| @@ -593,7 +585,7 @@ input chunk. In the end, after the user passes a | |||
| 593 | \texttt{nil} chunk notifying the filter that there is no | 585 | \texttt{nil} chunk notifying the filter that there is no |
| 594 | more input data, the filter might still have to produce too | 586 | more input data, the filter might still have to produce too |
| 595 | much output data to return in a single chunk. The user has | 587 | much output data to return in a single chunk. The user has |
| 596 | to loop again, this time passing \texttt{nil} each time, | 588 | to loop again, now passing \texttt{nil} to the filter each time, |
| 597 | until the filter itself returns \texttt{nil} to notify the | 589 | until the filter itself returns \texttt{nil} to notify the |
| 598 | user it is finally done. | 590 | user it is finally done. |
| 599 | 591 | ||
| @@ -602,9 +594,9 @@ the new interface. In fact, the end-of-line translation | |||
| 602 | filter we presented earlier already conforms to it. The | 594 | filter we presented earlier already conforms to it. The |
| 603 | complexity is encapsulated within the chaining functions, | 595 | complexity is encapsulated within the chaining functions, |
| 604 | which must now include a loop. Since these functions only | 596 | which must now include a loop. Since these functions only |
| 605 | have to be written once, the user is not affected. | 597 | have to be written once, the user is rarely affected. |
| 606 | Interestingly, the modifications do not have a measurable | 598 | Interestingly, the modifications do not have a measurable |
| 607 | negative impact in the the performance of filters that do | 599 | negative impact in the performance of filters that do |
| 608 | not need the added flexibility. On the other hand, for a | 600 | not need the added flexibility. On the other hand, for a |
| 609 | small price in complexity, the changes make exploding | 601 | small price in complexity, the changes make exploding |
| 610 | filters practical. | 602 | filters practical. |
| @@ -617,7 +609,7 @@ and SMTP modules are especially integrated with LTN12, | |||
| 617 | and can be used to showcase the expressive power of filters, | 609 | and can be used to showcase the expressive power of filters, |
| 618 | sources, sinks, and pumps. Below is an example | 610 | sources, sinks, and pumps. Below is an example |
| 619 | of how a user would proceed to define and send a | 611 | of how a user would proceed to define and send a |
| 620 | multipart message with attachments, using \texttt{LuaSocket}: | 612 | multipart message, with attachments, using \texttt{LuaSocket}: |
| 621 | \begin{quote} | 613 | \begin{quote} |
| 622 | \begin{mime} | 614 | \begin{mime} |
| 623 | local smtp = require"socket.smtp" | 615 | local smtp = require"socket.smtp" |
| @@ -656,8 +648,8 @@ assert(smtp.send{ | |||
| 656 | The \texttt{smtp.message} function receives a table | 648 | The \texttt{smtp.message} function receives a table |
| 657 | describing the message, and returns a source. The | 649 | describing the message, and returns a source. The |
| 658 | \texttt{smtp.send} function takes this source, chains it with the | 650 | \texttt{smtp.send} function takes this source, chains it with the |
| 659 | SMTP dot-stuffing filter, creates a connects a socket sink | 651 | SMTP dot-stuffing filter, connects a socket sink |
| 660 | to the server, and simply pumps the data. The message is never | 652 | with the server, and simply pumps the data. The message is never |
| 661 | assembled in memory. Everything is produced on demand, | 653 | assembled in memory. Everything is produced on demand, |
| 662 | transformed in small pieces, and sent to the server in chunks, | 654 | transformed in small pieces, and sent to the server in chunks, |
| 663 | including the file attachment that is loaded from disk and | 655 | including the file attachment that is loaded from disk and |
| @@ -665,14 +657,14 @@ encoded on the fly. It just works. | |||
| 665 | 657 | ||
| 666 | \section{Conclusions} | 658 | \section{Conclusions} |
| 667 | 659 | ||
| 668 | In this article we introduce the concepts of filters, | 660 | In this article, we introduced the concepts of filters, |
| 669 | sources, sinks, and pumps to the Lua language. These are | 661 | sources, sinks, and pumps to the Lua language. These are |
| 670 | useful tools for data processing in general. Sources provide | 662 | useful tools for stream processing in general. Sources provide |
| 671 | a simple abstraction for data acquisition. Sinks provide an | 663 | a simple abstraction for data acquisition. Sinks provide an |
| 672 | abstraction for final data destinations. Filters define an | 664 | abstraction for final data destinations. Filters define an |
| 673 | interface for data transformations. The chaining of | 665 | interface for data transformations. The chaining of |
| 674 | filters, sources and sinks provides an elegant way to create | 666 | filters, sources and sinks provides an elegant way to create |
| 675 | arbitrarily complex data transformation from simpler | 667 | arbitrarily complex data transformations from simpler |
| 676 | transformations. Pumps simply move the data through. | 668 | components. Pumps simply move the data through. |
| 677 | 669 | ||
| 678 | \end{document} | 670 | \end{document} |
