aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorThijs Schreijer <thijs@thijsschreijer.nl>2022-03-29 14:09:10 +0200
committerThijs Schreijer <thijs@thijsschreijer.nl>2022-03-29 14:09:10 +0200
commitdb2f1c9598c63a721fad4b8ae0e0121eccc86248 (patch)
tree25b171f75348909ac95afa73655a8a33ae9b20d7
parent3adf252b45401b4b97e63668c6ee530e7b3936ad (diff)
downloadluasocket-db2f1c9598c63a721fad4b8ae0e0121eccc86248.tar.gz
luasocket-db2f1c9598c63a721fad4b8ae0e0121eccc86248.tar.bz2
luasocket-db2f1c9598c63a721fad4b8ae0e0121eccc86248.zip
chore(ltn) update file contents from wiki to markdown
-rw-r--r--ltn012.wiki205
-rw-r--r--ltn013.wiki127
2 files changed, 163 insertions, 169 deletions
diff --git a/ltn012.wiki b/ltn012.wiki
index 96b13ae..fa26b4a 100644
--- a/ltn012.wiki
+++ b/ltn012.wiki
@@ -1,51 +1,48 @@
1===Filters, sources and sinks: design, motivation and examples=== 1# Filters, sources and sinks: design, motivation and examples
2==or Functional programming for the rest of us== 2### or Functional programming for the rest of us
3by DiegoNehab 3by DiegoNehab
4 4
5{{{ 5## Abstract
6 6
7}}} 7Certain operations can be implemented in the form of filters. A filter is a function that processes data received in consecutive function calls, returning partial results chunk by chunk. Examples of operations that can be implemented as filters include the end-of-line normalization for text, Base64 and Quoted-Printable transfer content encodings, the breaking of text into lines, SMTP byte stuffing, and there are many others. Filters become even more powerful when we allow them to be chained together to create composite filters. Filters can be seen as middle nodes in a chain of data transformations. Sources an sinks are the corresponding end points of these chains. A source is a function that produces data, chunk by chunk, and a sink is a function that takes data, chunk by chunk. In this technical note, we define an elegant interface for filters, sources, sinks and chaining. We evolve our interface progressively, until we reach a high degree of generality. We discuss difficulties that arise during the implementation of this interface and we provide solutions and examples.
8 8
9===Abstract=== 9## Introduction
10Certain operations can be implemented in the form of filters. A filter is a function that processes data received in consecutive function calls, returning partial results chunk by chunk. Examples of operations that can be implemented as filters include the end-of-line normalization for text, Base64 and Quoted-Printable transfer content encodings, the breaking of text into lines, SMTP byte stuffing, and there are many others. Filters become even more powerful when we allow them to be chained together to create composite filters. Filters can be seen as middle nodes in a chain of data transformations. Sources an sinks are the corresponding end points of these chains. A source is a function that produces data, chunk by chunk, and a sink is a function that takes data, chunk by chunk. In this technical note, we define an elegant interface for filters, sources, sinks and chaining. We evolve our interface progressively, until we reach a high degree of generality. We discuss difficulties that arise during the implementation of this interface and we provide solutions and examples.
11 10
12===Introduction=== 11Applications sometimes have too much information to process to fit in memory and are thus forced to process data in smaller parts. Even when there is enough memory, processing all the data atomically may take long enough to frustrate a user that wants to interact with the application. Furthermore, complex transformations can often be defined as series of simpler operations. Several different complex transformations might share the same simpler operations, so that an uniform interface to combine them is desirable. The following concepts constitute our solution to these problems.
13 12
14Applications sometimes have too much information to process to fit in memory and are thus forced to process data in smaller parts. Even when there is enough memory, processing all the data atomically may take long enough to frustrate a user that wants to interact with the application. Furthermore, complex transformations can often be defined as series of simpler operations. Several different complex transformations might share the same simpler operations, so that an uniform interface to combine them is desirable. The following concepts constitute our solution to these problems. 13"Filters" are functions that accept successive chunks of input, and produce successive chunks of output. Furthermore, the result of concatenating all the output data is the same as the result of applying the filter over the concatenation of the input data. As a consequence, boundaries are irrelevant: filters have to handle input data split arbitrarily by the user.
15 14
16''Filters'' are functions that accept successive chunks of input, and produce successive chunks of output. Furthermore, the result of concatenating all the output data is the same as the result of applying the filter over the concatenation of the input data. As a consequence, boundaries are irrelevant: filters have to handle input data split arbitrarily by the user. 15A "chain" is a function that combines the effect of two (or more) other functions, but whose interface is indistinguishable from the interface of one of its components. Thus, a chained filter can be used wherever an atomic filter can be used. However, its effect on data is the combined effect of its component filters. Note that, as a consequence, chains can be chained themselves to create arbitrarily complex operations that can be used just like atomic operations.
17 16
18A ''chain'' is a function that combines the effect of two (or more) other functions, but whose interface is indistinguishable from the interface of one of its components. Thus, a chained filter can be used wherever an atomic filter can be used. However, its effect on data is the combined effect of its component filters. Note that, as a consequence, chains can be chained themselves to create arbitrarily complex operations that can be used just like atomic operations. 17Filters can be seen as internal nodes in a network through which data flows, potentially being transformed along its way. Chains connect these nodes together. To complete the picture, we need "sources" and "sinks" as initial and final nodes of the network, respectively. Less abstractly, a source is a function that produces new data every time it is called. On the other hand, sinks are functions that give a final destination to the data they receive. Naturally, sources and sinks can be chained with filters.
19 18
20Filters can be seen as internal nodes in a network through which data flows, potentially being transformed along its way. Chains connect these nodes together. To complete the picture, we need ''sources'' and ''sinks'' as initial and final nodes of the network, respectively. Less abstractly, a source is a function that produces new data every time it is called. On the other hand, sinks are functions that give a final destination to the data they receive. Naturally, sources and sinks can be chained with filters. 19Finally, filters, chains, sources, and sinks are all passive entities: they need to be repeatedly called in order for something to happen. "Pumps" provide the driving force that pushes data through the network, from a source to a sink.
21 20
22Finally, filters, chains, sources, and sinks are all passive entities: they need to be repeatedly called in order for something to happen. ''Pumps'' provide the driving force that pushes data through the network, from a source to a sink. 21 Hopefully, these concepts will become clear with examples. In the following sections, we start with simplified interfaces, which we improve several times until we can find no obvious shortcomings. The evolution we present is not contrived: it follows the steps we followed ourselves as we consolidated our understanding of these concepts.
23 22
24 Hopefully, these concepts will become clear with examples. In the following sections, we start with simplified interfaces, which we improve several times until we can find no obvious shortcomings. The evolution we present is not contrived: it follows the steps we followed ourselves as we consolidated our understanding of these concepts. 23### A concrete example
25 24
26== A concrete example == 25Some data transformations are easier to implement as filters than others. Examples of operations that can be implemented as filters include the end-of-line normalization for text, the Base64 and Quoted-Printable transfer content encodings, the breaking of text into lines, SMTP byte stuffing, and many others. Let's use the end-of-line normalization as an example to define our initial filter interface. We later discuss why the implementation might not be trivial.
27
28Some data transformations are easier to implement as filters than others. Examples of operations that can be implemented as filters include the end-of-line normalization for text, the Base64 and Quoted-Printable transfer content encodings, the breaking of text into lines, SMTP byte stuffing, and many others. Let's use the end-of-line normalization as an example to define our initial filter interface. We later discuss why the implementation might not be trivial.
29 26
30Assume we are given text in an unknown end-of-line convention (including possibly mixed conventions) out of the commonly found Unix (LF), Mac OS (CR), and DOS (CRLF) conventions. We would like to be able to write code like the following: 27Assume we are given text in an unknown end-of-line convention (including possibly mixed conventions) out of the commonly found Unix (LF), Mac OS (CR), and DOS (CRLF) conventions. We would like to be able to write code like the following:
31 {{{ 28```lua
32input = source.chain(source.file(io.stdin), normalize("\r\n")) 29input = source.chain(source.file(io.stdin), normalize("\r\n"))
33output = sink.file(io.stdout) 30output = sink.file(io.stdout)
34pump(input, output) 31pump(input, output)
35}}} 32```
36 33
37This program should read data from the standard input stream and normalize the end-of-line markers to the canonic CRLF marker defined by the MIME standard, finally sending the results to the standard output stream. For that, we use a ''file source'' to produce data from standard input, and chain it with a filter that normalizes the data. The pump then repeatedly gets data from the source, and moves it to the ''file sink'' that sends it to standard output. 34This program should read data from the standard input stream and normalize the end-of-line markers to the canonic CRLF marker defined by the MIME standard, finally sending the results to the standard output stream. For that, we use a "file source" to produce data from standard input, and chain it with a filter that normalizes the data. The pump then repeatedly gets data from the source, and moves it to the "file sink" that sends it to standard output.
38 35
39To make the discussion even more concrete, we start by discussing the implementation of the normalization filter. The {{normalize}} ''factory'' is a function that creates such a filter. Our initial filter interface is as follows: the filter receives a chunk of input data, and returns a chunk of processed data. When there is no more input data, the user notifies the filter by invoking it with a {{nil}} chunk. The filter then returns the final chunk of processed data. 36To make the discussion even more concrete, we start by discussing the implementation of the normalization filter. The `normalize` "factory" is a function that creates such a filter. Our initial filter interface is as follows: the filter receives a chunk of input data, and returns a chunk of processed data. When there is no more input data, the user notifies the filter by invoking it with a `nil` chunk. The filter then returns the final chunk of processed data.
40 37
41Although the interface is extremely simple, the implementation doesn't seem so obvious. Any filter respecting this interface needs to keep some kind of context between calls. This is because chunks can be broken between the CR and LF characters marking the end of a line. This need for context storage is what motivates the use of factories: each time the factory is called, it returns a filter with its own context so that we can have several independent filters being used at the same time. For the normalization filter, we know that the obvious solution (i.e. concatenating all the input into the context before producing any output) is not good enough, so we will have to find another way. 38Although the interface is extremely simple, the implementation doesn't seem so obvious. Any filter respecting this interface needs to keep some kind of context between calls. This is because chunks can be broken between the CR and LF characters marking the end of a line. This need for context storage is what motivates the use of factories: each time the factory is called, it returns a filter with its own context so that we can have several independent filters being used at the same time. For the normalization filter, we know that the obvious solution (i.e. concatenating all the input into the context before producing any output) is not good enough, so we will have to find another way.
42 39
43We will break the implementation in two parts: a low-level filter, and a factory of high-level filters. The low-level filter will be implemented in C and will not carry any context between function calls. The high-level filter factory, implemented in Lua, will create and return a high-level filter that keeps whatever context the low-level filter needs, but isolates the user from its internal details. That way, we take advantage of C's efficiency to perform the dirty work, and take advantage of Lua's simplicity for the bookkeeping. 40We will break the implementation in two parts: a low-level filter, and a factory of high-level filters. The low-level filter will be implemented in C and will not carry any context between function calls. The high-level filter factory, implemented in Lua, will create and return a high-level filter that keeps whatever context the low-level filter needs, but isolates the user from its internal details. That way, we take advantage of C's efficiency to perform the dirty work, and take advantage of Lua's simplicity for the bookkeeping.
44 41
45==The Lua part of the implementation== 42### The Lua part of the implementation
46 43
47Below is the implementation of the factory of high-level end-of-line normalization filters: 44Below is the implementation of the factory of high-level end-of-line normalization filters:
48 {{{ 45```lua
49function filter.cycle(low, ctx, extra) 46function filter.cycle(low, ctx, extra)
50 return function(chunk) 47 return function(chunk)
51 local ret 48 local ret
@@ -57,18 +54,18 @@ end
57function normalize(marker) 54function normalize(marker)
58 return cycle(eol, 0, marker) 55 return cycle(eol, 0, marker)
59end 56end
60}}} 57```
61 58
62The {{normalize}} factory simply calls a more generic factory, the {{cycle}} factory. This factory receives a low-level filter, an initial context and some extra value and returns the corresponding high-level filter. Each time the high level filer is called with a new chunk, it calls the low-level filter passing the previous context, the new chunk and the extra argument. The low-level filter produces the chunk of processed data and a new context. Finally, the high-level filter updates its internal context and returns the processed chunk of data to the user. It is the low-level filter that does all the work. Notice that this implementation takes advantage of the Lua 5.0 lexical scoping rules to store the context locally, between function calls. 59The `normalize` factory simply calls a more generic factory, the `cycle` factory. This factory receives a low-level filter, an initial context and some extra value and returns the corresponding high-level filter. Each time the high level filer is called with a new chunk, it calls the low-level filter passing the previous context, the new chunk and the extra argument. The low-level filter produces the chunk of processed data and a new context. Finally, the high-level filter updates its internal context and returns the processed chunk of data to the user. It is the low-level filter that does all the work. Notice that this implementation takes advantage of the Lua 5.0 lexical scoping rules to store the context locally, between function calls.
63 60
64Moving to the low-level filter, we notice there is no perfect solution to the end-of-line marker normalization problem itself. The difficulty comes from an inherent ambiguity on the definition of empty lines within mixed input. However, the following solution works well for any consistent input, as well as for non-empty lines in mixed input. It also does a reasonable job with empty lines and serves as a good example of how to implement a low-level filter. 61Moving to the low-level filter, we notice there is no perfect solution to the end-of-line marker normalization problem itself. The difficulty comes from an inherent ambiguity on the definition of empty lines within mixed input. However, the following solution works well for any consistent input, as well as for non-empty lines in mixed input. It also does a reasonable job with empty lines and serves as a good example of how to implement a low-level filter.
65 62
66Here is what we do: CR and LF are considered candidates for line break. We issue ''one'' end-of-line line marker if one of the candidates is seen alone, or followed by a ''different'' candidate. That is, CR&nbsp;CR and LF&nbsp;LF issue two end of line markers each, but CR&nbsp;LF and LF&nbsp;CR issue only one marker. This idea takes care of Mac OS, Mac OS X, VMS and Unix, DOS and MIME, as well as probably other more obscure conventions. 63Here is what we do: CR and LF are considered candidates for line break. We issue "one" end-of-line line marker if one of the candidates is seen alone, or followed by a "different" candidate. That is, CR&nbsp;CR and LF&nbsp;LF issue two end of line markers each, but CR&nbsp;LF and LF&nbsp;CR issue only one marker. This idea takes care of Mac OS, Mac OS X, VMS and Unix, DOS and MIME, as well as probably other more obscure conventions.
67 64
68==The C part of the implementation== 65### The C part of the implementation
69 66
70The low-level filter is divided into two simple functions. The inner function actually does the conversion. It takes each input character in turn, deciding what to output and how to modify the context. The context tells if the last character seen was a candidate and, if so, which candidate it was. 67The low-level filter is divided into two simple functions. The inner function actually does the conversion. It takes each input character in turn, deciding what to output and how to modify the context. The context tells if the last character seen was a candidate and, if so, which candidate it was.
71 {{{ 68```c
72#define candidate(c) (c == CR || c == LF) 69#define candidate(c) (c == CR || c == LF)
73static int process(int c, int last, const char *marker, luaL_Buffer *buffer) { 70static int process(int c, int last, const char *marker, luaL_Buffer *buffer) {
74 if (candidate(c)) { 71 if (candidate(c)) {
@@ -84,10 +81,10 @@ static int process(int c, int last, const char *marker, luaL_Buffer *buffer) {
84 return 0; 81 return 0;
85 } 82 }
86} 83}
87}}} 84```
88 85
89The inner function makes use of Lua's auxiliary library's buffer interface for its efficiency and ease of use. The outer function simply interfaces with Lua. It receives the context and the input chunk (as well as an optional end-of-line marker), and returns the transformed output and the new context. 86The inner function makes use of Lua's auxiliary library's buffer interface for its efficiency and ease of use. The outer function simply interfaces with Lua. It receives the context and the input chunk (as well as an optional end-of-line marker), and returns the transformed output and the new context.
90 {{{ 87```c
91static int eol(lua_State *L) { 88static int eol(lua_State *L) {
92 int ctx = luaL_checkint(L, 1); 89 int ctx = luaL_checkint(L, 1);
93 size_t isize = 0; 90 size_t isize = 0;
@@ -107,16 +104,16 @@ static int eol(lua_State *L) {
107 lua_pushnumber(L, ctx); 104 lua_pushnumber(L, ctx);
108 return 2; 105 return 2;
109} 106}
110}}} 107```
111 108
112Notice that if the input chunk is {{nil}}, the operation is considered to be finished. In that case, the loop will not execute a single time and the context is reset to the initial state. This allows the filter to be reused indefinitely. It is a good idea to write filters like this, when possible. 109Notice that if the input chunk is `nil`, the operation is considered to be finished. In that case, the loop will not execute a single time and the context is reset to the initial state. This allows the filter to be reused indefinitely. It is a good idea to write filters like this, when possible.
113 110
114Besides the end-of-line normalization filter shown above, many other filters can be implemented with the same ideas. Examples include Base64 and Quoted-Printable transfer content encodings, the breaking of text into lines, SMTP byte stuffing etc. The challenging part is to decide what will be the context. For line breaking, for instance, it could be the number of bytes left in the current line. For Base64 encoding, it could be the bytes that remain in the division of the input into 3-byte atoms. 111Besides the end-of-line normalization filter shown above, many other filters can be implemented with the same ideas. Examples include Base64 and Quoted-Printable transfer content encodings, the breaking of text into lines, SMTP byte stuffing etc. The challenging part is to decide what will be the context. For line breaking, for instance, it could be the number of bytes left in the current line. For Base64 encoding, it could be the bytes that remain in the division of the input into 3-byte atoms.
115 112
116===Chaining=== 113## Chaining
117 114
118Filters become more powerful when the concept of chaining is introduced. Suppose you have a filter for Quoted-Printable encoding and you want to encode some text. According to the standard, the text has to be normalized into its canonic form prior to encoding. A nice interface that simplifies this task is a factory that creates a composite filter that passes data through multiple filters, but that can be used wherever a primitive filter is used. 115Filters become more powerful when the concept of chaining is introduced. Suppose you have a filter for Quoted-Printable encoding and you want to encode some text. According to the standard, the text has to be normalized into its canonic form prior to encoding. A nice interface that simplifies this task is a factory that creates a composite filter that passes data through multiple filters, but that can be used wherever a primitive filter is used.
119 {{{ 116```lua
120local function chain2(f1, f2) 117local function chain2(f1, f2)
121 return function(chunk) 118 return function(chunk)
122 local ret = f2(f1(chunk)) 119 local ret = f2(f1(chunk))
@@ -140,18 +137,18 @@ while 1 do
140 io.write(chain(chunk)) 137 io.write(chain(chunk))
141 if not chunk then break end 138 if not chunk then break end
142end 139end
143}}} 140```
144 141
145The chaining factory is very simple. All it does is return a function that passes data through all filters and returns the result to the user. It uses the simpler auxiliary function that knows how to chain two filters together. In the auxiliary function, special care must be taken if the chunk is final. This is because the final chunk notification has to be pushed through both filters in turn. Thanks to the chain factory, it is easy to perform the Quoted-Printable conversion, as the above example shows. 142The chaining factory is very simple. All it does is return a function that passes data through all filters and returns the result to the user. It uses the simpler auxiliary function that knows how to chain two filters together. In the auxiliary function, special care must be taken if the chunk is final. This is because the final chunk notification has to be pushed through both filters in turn. Thanks to the chain factory, it is easy to perform the Quoted-Printable conversion, as the above example shows.
146 143
147===Sources, sinks, and pumps=== 144## Sources, sinks, and pumps
148 145
149As we noted in the introduction, the filters we introduced so far act as the internal nodes in a network of transformations. Information flows from node to node (or rather from one filter to the next) and is transformed on its way out. Chaining filters together is the way we found to connect nodes in the network. But what about the end nodes? In the beginning of the network, we need a node that provides the data, a source. In the end of the network, we need a node that takes in the data, a sink. 146As we noted in the introduction, the filters we introduced so far act as the internal nodes in a network of transformations. Information flows from node to node (or rather from one filter to the next) and is transformed on its way out. Chaining filters together is the way we found to connect nodes in the network. But what about the end nodes? In the beginning of the network, we need a node that provides the data, a source. In the end of the network, we need a node that takes in the data, a sink.
150 147
151==Sources== 148### Sources
152 149
153We start with two simple sources. The first is the {{empty}} source: It simply returns no data, possibly returning an error message. The second is the {{file}} source, which produces the contents of a file in a chunk by chunk fashion, closing the file handle when done. 150We start with two simple sources. The first is the `empty` source: It simply returns no data, possibly returning an error message. The second is the `file` source, which produces the contents of a file in a chunk by chunk fashion, closing the file handle when done.
154 {{{ 151```lua
155function source.empty(err) 152function source.empty(err)
156 return function() 153 return function()
157 return nil, err 154 return nil, err
@@ -159,7 +156,7 @@ function source.empty(err)
159end 156end
160 157
161function source.file(handle, io_err) 158function source.file(handle, io_err)
162 if handle then 159 if handle then
163 return function() 160 return function()
164 local chunk = handle:read(2048) 161 local chunk = handle:read(2048)
165 if not chunk then handle:close() end 162 if not chunk then handle:close() end
@@ -167,44 +164,44 @@ function source.file(handle, io_err)
167 end 164 end
168 else return source.empty(io_err or "unable to open file") end 165 else return source.empty(io_err or "unable to open file") end
169end 166end
170}}} 167```
171 168
172A source returns the next chunk of data each time it is called. When there is no more data, it just returns {{nil}}. If there is an error, the source can inform the caller by returning {{nil}} followed by an error message. Adrian Sietsma noticed that, although not on purpose, the interface for sources is compatible with the idea of iterators in Lua 5.0. That is, a data source can be nicely used in conjunction with {{for}} loops. Using our file source as an iterator, we can rewrite our first example: 169A source returns the next chunk of data each time it is called. When there is no more data, it just returns `nil`. If there is an error, the source can inform the caller by returning `nil` followed by an error message. Adrian Sietsma noticed that, although not on purpose, the interface for sources is compatible with the idea of iterators in Lua 5.0. That is, a data source can be nicely used in conjunction with `for` loops. Using our file source as an iterator, we can rewrite our first example:
173 {{{ 170```lua
174local process = normalize("\r\n") 171local process = normalize("\r\n")
175for chunk in source.file(io.stdin) do 172for chunk in source.file(io.stdin) do
176 io.write(process(chunk)) 173 io.write(process(chunk))
177end 174end
178io.write(process(nil)) 175io.write(process(nil))
179}}} 176```
180 177
181Notice that the last call to the filter obtains the last chunk of processed data. The loop terminates when the source returns {{nil}} and therefore we need that final call outside of the loop. 178Notice that the last call to the filter obtains the last chunk of processed data. The loop terminates when the source returns `nil` and therefore we need that final call outside of the loop.
182 179
183==Maintaining state between calls== 180### Maintaining state between calls
184 181
185It is often the case that a source needs to change its behavior after some event. One simple example would be a file source that wants to make sure it returns {{nil}} regardless of how many times it is called after the end of file, avoiding attempts to read past the end of the file. Another example would be a source that returns the contents of several files, as if they were concatenated, moving from one file to the next until the end of the last file is reached. 182It is often the case that a source needs to change its behavior after some event. One simple example would be a file source that wants to make sure it returns `nil` regardless of how many times it is called after the end of file, avoiding attempts to read past the end of the file. Another example would be a source that returns the contents of several files, as if they were concatenated, moving from one file to the next until the end of the last file is reached.
186 183
187One way to implement this kind of source is to have the factory declare extra state variables that the source can use via lexical scoping. Our file source could set the file handle itself to {{nil}} when it detects the end-of-file. Then, every time the source is called, it could check if the handle is still valid and act accordingly: 184One way to implement this kind of source is to have the factory declare extra state variables that the source can use via lexical scoping. Our file source could set the file handle itself to `nil` when it detects the end-of-file. Then, every time the source is called, it could check if the handle is still valid and act accordingly:
188 {{{ 185```lua
189function source.file(handle, io_err) 186function source.file(handle, io_err)
190 if handle then 187 if handle then
191 return function() 188 return function()
192 if not handle then return nil end 189 if not handle then return nil end
193 local chunk = handle:read(2048) 190 local chunk = handle:read(2048)
194 if not chunk then 191 if not chunk then
195 handle:close() 192 handle:close()
196 handle = nil 193 handle = nil
197 end 194 end
198 return chunk 195 return chunk
199 end 196 end
200 else return source.empty(io_err or "unable to open file") end 197 else return source.empty(io_err or "unable to open file") end
201end 198end
202}}} 199```
203 200
204Another way to implement this behavior involves a change in the source interface to makes it more flexible. Let's allow a source to return a second value, besides the next chunk of data. If the returned chunk is {{nil}}, the extra return value tells us what happened. A second {{nil}} means that there is just no more data and the source is empty. Any other value is considered to be an error message. On the other hand, if the chunk was ''not'' {{nil}}, the second return value tells us whether the source wants to be replaced. If it is {{nil}}, we should proceed using the same source. Otherwise it has to be another source, which we have to use from then on, to get the remaining data. 201Another way to implement this behavior involves a change in the source interface to makes it more flexible. Let's allow a source to return a second value, besides the next chunk of data. If the returned chunk is `nil`, the extra return value tells us what happened. A second `nil` means that there is just no more data and the source is empty. Any other value is considered to be an error message. On the other hand, if the chunk was "not" `nil`, the second return value tells us whether the source wants to be replaced. If it is `nil`, we should proceed using the same source. Otherwise it has to be another source, which we have to use from then on, to get the remaining data.
205 202
206This extra freedom is good for someone writing a source function, but it is a pain for those that have to use it. Fortunately, given one of these ''fancy'' sources, we can transform it into a simple source that never needs to be replaced, using the following factory. 203This extra freedom is good for someone writing a source function, but it is a pain for those that have to use it. Fortunately, given one of these "fancy" sources, we can transform it into a simple source that never needs to be replaced, using the following factory.
207 {{{ 204```lua
208function source.simplify(src) 205function source.simplify(src)
209 return function() 206 return function()
210 local chunk, err_or_new = src() 207 local chunk, err_or_new = src()
@@ -213,28 +210,28 @@ function source.simplify(src)
213 else return chunk end 210 else return chunk end
214 end 211 end
215end 212end
216}}} 213```
217 214
218The simplification factory allows us to write fancy sources and use them as if they were simple. Therefore, our next functions will only produce simple sources, and functions that take sources will assume they are simple. 215The simplification factory allows us to write fancy sources and use them as if they were simple. Therefore, our next functions will only produce simple sources, and functions that take sources will assume they are simple.
219 216
220Going back to our file source, the extended interface allows for a more elegant implementation. The new source just asks to be replaced by an empty source as soon as there is no more data. There is no repeated checking of the handle. To make things simpler to the user, the factory itself simplifies the the fancy file source before returning it to the user: 217Going back to our file source, the extended interface allows for a more elegant implementation. The new source just asks to be replaced by an empty source as soon as there is no more data. There is no repeated checking of the handle. To make things simpler to the user, the factory itself simplifies the the fancy file source before returning it to the user:
221 {{{ 218```lua
222function source.file(handle, io_err) 219function source.file(handle, io_err)
223 if handle then 220 if handle then
224 return source.simplify(function() 221 return source.simplify(function()
225 local chunk = handle:read(2048) 222 local chunk = handle:read(2048)
226 if not chunk then 223 if not chunk then
227 handle:close() 224 handle:close()
228 return "", source.empty() 225 return "", source.empty()
229 end 226 end
230 return chunk 227 return chunk
231 end) 228 end)
232 else return source.empty(io_err or "unable to open file") end 229 else return source.empty(io_err or "unable to open file") end
233end 230end
234}}} 231```
235 232
236We can make these ideas even more powerful if we use a new feature of Lua 5.0: coroutines. Coroutines suffer from a great lack of advertisement, and I am going to play my part here. Just like lexical scoping, coroutines taste odd at first, but once you get used with the concept, it can save your day. I have to admit that using coroutines to implement our file source would be overkill, so let's implement a concatenated source factory instead. 233We can make these ideas even more powerful if we use a new feature of Lua 5.0: coroutines. Coroutines suffer from a great lack of advertisement, and I am going to play my part here. Just like lexical scoping, coroutines taste odd at first, but once you get used with the concept, it can save your day. I have to admit that using coroutines to implement our file source would be overkill, so let's implement a concatenated source factory instead.
237 {{{ 234```lua
238function source.cat(...) 235function source.cat(...)
239 local arg = {...} 236 local arg = {...}
240 local co = coroutine.create(function() 237 local co = coroutine.create(function()
@@ -242,22 +239,22 @@ function source.cat(...)
242 while i <= #arg do 239 while i <= #arg do
243 local chunk, err = arg[i]() 240 local chunk, err = arg[i]()
244 if chunk then coroutine.yield(chunk) 241 if chunk then coroutine.yield(chunk)
245 elseif err then return nil, err 242 elseif err then return nil, err
246 else i = i + 1 end 243 else i = i + 1 end
247 end 244 end
248 end) 245 end)
249 return function() 246 return function()
250 return shift(coroutine.resume(co)) 247 return shift(coroutine.resume(co))
251 end 248 end
252end 249end
253}}} 250```
254 251
255The factory creates two functions. The first is an auxiliary that does all the work, in the form of a coroutine. It reads a chunk from one of the sources. If the chunk is {{nil}}, it moves to the next source, otherwise it just yields returning the chunk. When it is resumed, it continues from where it stopped and tries to read the next chunk. The second function is the source itself, and just resumes the execution of the auxiliary coroutine, returning to the user whatever chunks it returns (skipping the first result that tells us if the coroutine terminated). Imagine writing the same function without coroutines and you will notice the simplicity of this implementation. We will use coroutines again when we make the filter interface more powerful. 252The factory creates two functions. The first is an auxiliary that does all the work, in the form of a coroutine. It reads a chunk from one of the sources. If the chunk is `nil`, it moves to the next source, otherwise it just yields returning the chunk. When it is resumed, it continues from where it stopped and tries to read the next chunk. The second function is the source itself, and just resumes the execution of the auxiliary coroutine, returning to the user whatever chunks it returns (skipping the first result that tells us if the coroutine terminated). Imagine writing the same function without coroutines and you will notice the simplicity of this implementation. We will use coroutines again when we make the filter interface more powerful.
256 253
257==Chaining Sources== 254### Chaining Sources
258 255
259What does it mean to chain a source with a filter? The most useful interpretation is that the combined source-filter is a new source that produces data and passes it through the filter before returning it. Here is a factory that does it: 256What does it mean to chain a source with a filter? The most useful interpretation is that the combined source-filter is a new source that produces data and passes it through the filter before returning it. Here is a factory that does it:
260 {{{ 257```lua
261function source.chain(src, f) 258function source.chain(src, f)
262 return source.simplify(function() 259 return source.simplify(function()
263 local chunk, err = src() 260 local chunk, err = src()
@@ -265,14 +262,14 @@ function source.chain(src, f)
265 else return f(chunk) end 262 else return f(chunk) end
266 end) 263 end)
267end 264end
268}}} 265```
269 266
270Our motivating example in the introduction chains a source with a filter. The idea of chaining a source with a filter is useful when one thinks about functions that might get their input data from a source. By chaining a simple source with one or more filters, the same function can be provided with filtered data even though it is unaware of the filtering that is happening behind its back. 267Our motivating example in the introduction chains a source with a filter. The idea of chaining a source with a filter is useful when one thinks about functions that might get their input data from a source. By chaining a simple source with one or more filters, the same function can be provided with filtered data even though it is unaware of the filtering that is happening behind its back.
271 268
272==Sinks== 269### Sinks
273 270
274Just as we defined an interface for an initial source of data, we can also define an interface for a final destination of data. We call any function respecting that interface a ''sink''. Below are two simple factories that return sinks. The table factory creates a sink that stores all obtained data into a table. The data can later be efficiently concatenated into a single string with the {{table.concat}} library function. As another example, we introduce the {{null}} sink: A sink that simply discards the data it receives. 271Just as we defined an interface for an initial source of data, we can also define an interface for a final destination of data. We call any function respecting that interface a "sink". Below are two simple factories that return sinks. The table factory creates a sink that stores all obtained data into a table. The data can later be efficiently concatenated into a single string with the `table.concat` library function. As another example, we introduce the `null` sink: A sink that simply discards the data it receives.
275 {{{ 272```lua
276function sink.table(t) 273function sink.table(t)
277 t = t or {} 274 t = t or {}
278 local f = function(chunk, err) 275 local f = function(chunk, err)
@@ -289,12 +286,12 @@ end
289function sink.null() 286function sink.null()
290 return null 287 return null
291end 288end
292}}} 289```
293 290
294Sinks receive consecutive chunks of data, until the end of data is notified with a {{nil}} chunk. An error is notified by an extra argument giving an error message after the {{nil}} chunk. If a sink detects an error itself and wishes not to be called again, it should return {{nil}}, optionally followed by an error message. A return value that is not {{nil}} means the source will accept more data. Finally, just as sources can choose to be replaced, so can sinks, following the same interface. Once again, it is easy to implement a {{sink.simplify}} factory that transforms a fancy sink into a simple sink. 291Sinks receive consecutive chunks of data, until the end of data is notified with a `nil` chunk. An error is notified by an extra argument giving an error message after the `nil` chunk. If a sink detects an error itself and wishes not to be called again, it should return `nil`, optionally followed by an error message. A return value that is not `nil` means the source will accept more data. Finally, just as sources can choose to be replaced, so can sinks, following the same interface. Once again, it is easy to implement a `sink.simplify` factory that transforms a fancy sink into a simple sink.
295 292
296As an example, let's create a source that reads from the standard input, then chain it with a filter that normalizes the end-of-line convention and let's use a sink to place all data into a table, printing the result in the end. 293As an example, let's create a source that reads from the standard input, then chain it with a filter that normalizes the end-of-line convention and let's use a sink to place all data into a table, printing the result in the end.
297 {{{ 294```lua
298local load = source.chain(source.file(io.stdin), normalize("\r\n")) 295local load = source.chain(source.file(io.stdin), normalize("\r\n"))
299local store, t = sink.table() 296local store, t = sink.table()
300while 1 do 297while 1 do
@@ -303,10 +300,10 @@ while 1 do
303 if not chunk then break end 300 if not chunk then break end
304end 301end
305print(table.concat(t)) 302print(table.concat(t))
306}}} 303```
307 304
308Again, just as we created a factory that produces a chained source-filter from a source and a filter, it is easy to create a factory that produces a new sink given a sink and a filter. The new sink passes all data it receives through the filter before handing it in to the original sink. Here is the implementation: 305Again, just as we created a factory that produces a chained source-filter from a source and a filter, it is easy to create a factory that produces a new sink given a sink and a filter. The new sink passes all data it receives through the filter before handing it in to the original sink. Here is the implementation:
309 {{{ 306```lua
310function sink.chain(f, snk) 307function sink.chain(f, snk)
311 return function(chunk, err) 308 return function(chunk, err)
312 local r, e = snk(f(chunk)) 309 local r, e = snk(f(chunk))
@@ -315,12 +312,12 @@ function sink.chain(f, snk)
315 return 1 312 return 1
316 end 313 end
317end 314end
318}}} 315```
319 316
320==Pumps== 317### Pumps
321 318
322There is a while loop that has been around for too long in our examples. It's always there because everything that we designed so far is passive. Sources, sinks, filters: None of them will do anything on their own. The operation of pumping all data a source can provide into a sink is so common that we will provide a couple helper functions to do that for us. 319There is a while loop that has been around for too long in our examples. It's always there because everything that we designed so far is passive. Sources, sinks, filters: None of them will do anything on their own. The operation of pumping all data a source can provide into a sink is so common that we will provide a couple helper functions to do that for us.
323 {{{ 320```lua
324function pump.step(src, snk) 321function pump.step(src, snk)
325 local chunk, src_err = src() 322 local chunk, src_err = src()
326 local ret, snk_err = snk(chunk, src_err) 323 local ret, snk_err = snk(chunk, src_err)
@@ -334,31 +331,31 @@ function pump.all(src, snk, step)
334 if not ret then return not err, err end 331 if not ret then return not err, err end
335 end 332 end
336end 333end
337}}} 334```
338 335
339The {{pump.step}} function moves one chunk of data from the source to the sink. The {{pump.all}} function takes an optional {{step}} function and uses it to pump all the data from the source to the sink. We can now use everything we have to write a program that reads a binary file from disk and stores it in another file, after encoding it to the Base64 transfer content encoding: 336The `pump.step` function moves one chunk of data from the source to the sink. The `pump.all` function takes an optional `step` function and uses it to pump all the data from the source to the sink. We can now use everything we have to write a program that reads a binary file from disk and stores it in another file, after encoding it to the Base64 transfer content encoding:
340 {{{ 337```lua
341local load = source.chain( 338local load = source.chain(
342 source.file(io.open("input.bin", "rb")), 339 source.file(io.open("input.bin", "rb")),
343 encode("base64") 340 encode("base64")
344) 341)
345local store = sink.chain( 342local store = sink.chain(
346 wrap(76), 343 wrap(76),
347 sink.file(io.open("output.b64", "w")), 344 sink.file(io.open("output.b64", "w")),
348) 345)
349pump.all(load, store) 346pump.all(load, store)
350}}} 347```
351 348
352The way we split the filters here is not intuitive, on purpose. Alternatively, we could have chained the Base64 encode filter and the line-wrap filter together, and then chain the resulting filter with either the file source or the file sink. It doesn't really matter. 349The way we split the filters here is not intuitive, on purpose. Alternatively, we could have chained the Base64 encode filter and the line-wrap filter together, and then chain the resulting filter with either the file source or the file sink. It doesn't really matter.
353 350
354===One last important change=== 351## One last important change
355 352
356Turns out we still have a problem. When David Burgess was writing his gzip filter, he noticed that the decompression filter can explode a small input chunk into a huge amount of data. Although we wished we could ignore this problem, we soon agreed we couldn't. The only solution is to allow filters to return partial results, and that is what we chose to do. After invoking the filter to pass input data, the user now has to loop invoking the filter to find out if it has more output data to return. Note that these extra calls can't pass more data to the filter. 353Turns out we still have a problem. When David Burgess was writing his gzip filter, he noticed that the decompression filter can explode a small input chunk into a huge amount of data. Although we wished we could ignore this problem, we soon agreed we couldn't. The only solution is to allow filters to return partial results, and that is what we chose to do. After invoking the filter to pass input data, the user now has to loop invoking the filter to find out if it has more output data to return. Note that these extra calls can't pass more data to the filter.
357 354
358More specifically, after passing a chunk of input data to a filter and collecting the first chunk of output data, the user invokes the filter repeatedly, passing the empty string, to get extra output chunks. When the filter itself returns an empty string, the user knows there is no more output data, and can proceed to pass the next input chunk. In the end, after the user passes a {{nil}} notifying the filter that there is no more input data, the filter might still have produced too much output data to return in a single chunk. The user has to loop again, this time passing {{nil}} each time, until the filter itself returns {{nil}} to notify the user it is finally done. 355More specifically, after passing a chunk of input data to a filter and collecting the first chunk of output data, the user invokes the filter repeatedly, passing the empty string, to get extra output chunks. When the filter itself returns an empty string, the user knows there is no more output data, and can proceed to pass the next input chunk. In the end, after the user passes a `nil` notifying the filter that there is no more input data, the filter might still have produced too much output data to return in a single chunk. The user has to loop again, this time passing `nil` each time, until the filter itself returns `nil` to notify the user it is finally done.
359 356
360Most filters won't need this extra freedom. Fortunately, the new filter interface is easy to implement. In fact, the end-of-line translation filter we created in the introduction already conforms to it. On the other hand, the chaining function becomes much more complicated. If it wasn't for coroutines, I wouldn't be happy to implement it. Let me know if you can find a simpler implementation that does not use coroutines! 357Most filters won't need this extra freedom. Fortunately, the new filter interface is easy to implement. In fact, the end-of-line translation filter we created in the introduction already conforms to it. On the other hand, the chaining function becomes much more complicated. If it wasn't for coroutines, I wouldn't be happy to implement it. Let me know if you can find a simpler implementation that does not use coroutines!
361 {{{ 358```lua
362local function chain2(f1, f2) 359local function chain2(f1, f2)
363 local co = coroutine.create(function(chunk) 360 local co = coroutine.create(function(chunk)
364 while true do 361 while true do
@@ -380,14 +377,14 @@ local function chain2(f1, f2)
380 return res 377 return res
381 end 378 end
382end 379end
383}}} 380```
384 381
385Chaining sources also becomes more complicated, but a similar solution is possible with coroutines. Chaining sinks is just as simple as it has always been. Interestingly, these modifications do not have a measurable negative impact in the the performance of filters that didn't need the added flexibility. They do severely improve the efficiency of filters like the gzip filter, though, and that is why we are keeping them. 382Chaining sources also becomes more complicated, but a similar solution is possible with coroutines. Chaining sinks is just as simple as it has always been. Interestingly, these modifications do not have a measurable negative impact in the the performance of filters that didn't need the added flexibility. They do severely improve the efficiency of filters like the gzip filter, though, and that is why we are keeping them.
386 383
387===Final considerations=== 384## Final considerations
388 385
389These ideas were created during the development of {{LuaSocket}}[http://www.tecgraf.puc-rio.br/luasocket] 2.0, and are available as the LTN12 module. As a result, {{LuaSocket}}[http://www.tecgraf.puc-rio.br/luasocket] implementation was greatly simplified and became much more powerful. The MIME module is especially integrated to LTN12 and provides many other filters. We felt these concepts deserved to be made public even to those that don't care about {{LuaSocket}}[http://www.tecgraf.puc-rio.br/luasocket], hence the LTN. 386These ideas were created during the development of [LuaSocket](https://github.com/lunarmodules/luasocket) 2.0, and are available as the LTN12 module. As a result, [LuaSocket](https://github.com/lunarmodules/luasocket) implementation was greatly simplified and became much more powerful. The MIME module is especially integrated to LTN12 and provides many other filters. We felt these concepts deserved to be made public even to those that don't care about [LuaSocket](https://github.com/lunarmodules/luasocket), hence the LTN.
390 387
391One extra application that deserves mentioning makes use of an identity filter. Suppose you want to provide some feedback to the user while a file is being downloaded into a sink. Chaining the sink with an identity filter (a filter that simply returns the received data unaltered), you can update a progress counter on the fly. The original sink doesn't have to be modified. Another interesting idea is that of a T sink: A sink that sends data to two other sinks. In summary, there appears to be enough room for many other interesting ideas. 388One extra application that deserves mentioning makes use of an identity filter. Suppose you want to provide some feedback to the user while a file is being downloaded into a sink. Chaining the sink with an identity filter (a filter that simply returns the received data unaltered), you can update a progress counter on the fly. The original sink doesn't have to be modified. Another interesting idea is that of a T sink: A sink that sends data to two other sinks. In summary, there appears to be enough room for many other interesting ideas.
392 389
393In this technical note we introduced filters, sources, sinks, and pumps. These are useful tools for data processing in general. Sources provide a simple abstraction for data acquisition. Sinks provide an abstraction for final data destinations. Filters define an interface for data transformations. The chaining of filters, sources and sinks provides an elegant way to create arbitrarily complex data transformation from simpler transformations. Pumps just put the machinery to work. 390In this technical note we introduced filters, sources, sinks, and pumps. These are useful tools for data processing in general. Sources provide a simple abstraction for data acquisition. Sinks provide an abstraction for final data destinations. Filters define an interface for data transformations. The chaining of filters, sources and sinks provides an elegant way to create arbitrarily complex data transformation from simpler transformations. Pumps just put the machinery to work.
diff --git a/ltn013.wiki b/ltn013.wiki
index a622424..9c56805 100644
--- a/ltn013.wiki
+++ b/ltn013.wiki
@@ -1,50 +1,47 @@
1===Using finalized exceptions=== 1# Using finalized exceptions
2==or How to get rid of all those if statements== 2### or How to get rid of all those if statements
3by DiegoNehab 3by DiegoNehab
4 4
5{{{
6 5
7}}} 6## Abstract
7This little LTN describes a simple exception scheme that greatly simplifies error checking in Lua programs. All the needed functionality ships standard with Lua, but is hidden between the `assert` and `pcall` functions. To make it more evident, we stick to a convenient standard (you probably already use anyways) for Lua function return values, and define two very simple helper functions (either in C or in Lua itself).
8 8
9===Abstract=== 9## Introduction
10This little LTN describes a simple exception scheme that greatly simplifies error checking in Lua programs. All the needed functionality ships standard with Lua, but is hidden between the {{assert}} and {{pcall}} functions. To make it more evident, we stick to a convenient standard (you probably already use anyways) for Lua function return values, and define two very simple helper functions (either in C or in Lua itself).
11 10
12===Introduction=== 11Most Lua functions return `nil` in case of error, followed by a message describing the error. If you don't use this convention, you probably have good reasons. Hopefully, after reading on, you will realize your reasons are not good enough.
13 12
14Most Lua functions return {{nil}} in case of error, followed by a message describing the error. If you don't use this convention, you probably have good reasons. Hopefully, after reading on, you will realize your reasons are not good enough. 13If you are like me, you hate error checking. Most nice little code snippets that look beautiful when you first write them lose some of their charm when you add all that error checking code. Yet, error checking is as important as the rest of the code. How sad.
15 14
16If you are like me, you hate error checking. Most nice little code snippets that look beautiful when you first write them lose some of their charm when you add all that error checking code. Yet, error checking is as important as the rest of the code. How sad. 15Even if you stick to a return convention, any complex task involving several function calls makes error checking both boring and error-prone (do you see the "error" below?)
17 16```lua
18Even if you stick to a return convention, any complex task involving several function calls makes error checking both boring and error-prone (do you see the ''error'' below?)
19 {{{
20function task(arg1, arg2, ...) 17function task(arg1, arg2, ...)
21 local ret1, err = task1(arg1) 18 local ret1, err = task1(arg1)
22 if not ret1 then 19 if not ret1 then
23 cleanup1() 20 cleanup1()
24 return nil, error 21 return nil, error
25 end 22 end
26 local ret2, err = task2(arg2) 23 local ret2, err = task2(arg2)
27 if not ret then 24 if not ret then
28 cleanup2() 25 cleanup2()
29 return nil, error 26 return nil, error
30 end 27 end
31 ... 28 ...
32end 29end
33}}} 30```
34 31
35The standard {{assert}} function provides an interesting alternative. To use it, simply nest every function call to be error checked with a call to {{assert}}. The {{assert}} function checks the value of its first argument. If it is {{nil}}, {{assert}} throws the second argument as an error message. Otherwise, {{assert}} lets all arguments through as if had not been there. The idea greatly simplifies error checking: 32The standard `assert` function provides an interesting alternative. To use it, simply nest every function call to be error checked with a call to `assert`. The `assert` function checks the value of its first argument. If it is `nil`, `assert` throws the second argument as an error message. Otherwise, `assert` lets all arguments through as if had not been there. The idea greatly simplifies error checking:
36 {{{ 33```lua
37function task(arg1, arg2, ...) 34function task(arg1, arg2, ...)
38 local ret1 = assert(task1(arg1)) 35 local ret1 = assert(task1(arg1))
39 local ret2 = assert(task2(arg2)) 36 local ret2 = assert(task2(arg2))
40 ... 37 ...
41end 38end
42}}} 39```
43 40
44If any task fails, the execution is aborted by {{assert}} and the error message is displayed to the user as the cause of the problem. If no error happens, the task completes as before. There isn't a single {{if}} statement and this is great. However, there are some problems with the idea. 41If any task fails, the execution is aborted by `assert` and the error message is displayed to the user as the cause of the problem. If no error happens, the task completes as before. There isn't a single `if` statement and this is great. However, there are some problems with the idea.
45 42
46First, the topmost {{task}} function doesn't respect the protocol followed by the lower-level tasks: It raises an error instead of returning {{nil}} followed by the error messages. Here is where the standard {{pcall}} comes in handy. 43First, the topmost `task` function doesn't respect the protocol followed by the lower-level tasks: It raises an error instead of returning `nil` followed by the error messages. Here is where the standard `pcall` comes in handy.
47 {{{ 44```lua
48function xtask(arg1, arg2, ...) 45function xtask(arg1, arg2, ...)
49 local ret1 = assert(task1(arg1)) 46 local ret1 = assert(task1(arg1))
50 local ret2 = assert(task2(arg2)) 47 local ret2 = assert(task2(arg2))
@@ -56,22 +53,22 @@ function task(arg1, arg2, ...)
56 if ok then return ret_or_err 53 if ok then return ret_or_err
57 else return nil, ret_or_err end 54 else return nil, ret_or_err end
58end 55end
59}}} 56```
60 57
61Our new {{task}} function is well behaved. {{Pcall}} catches any error raised by the calls to {{assert}} and returns it after the status code. That way, errors don't get propagated to the user of the high level {{task}} function. 58Our new `task` function is well behaved. `Pcall` catches any error raised by the calls to `assert` and returns it after the status code. That way, errors don't get propagated to the user of the high level `task` function.
62 59
63These are the main ideas for our exception scheme, but there are still a few glitches to fix: 60These are the main ideas for our exception scheme, but there are still a few glitches to fix:
64 61
65 * Directly using {{pcall}} ruined the simplicity of the code; 62* Directly using `pcall` ruined the simplicity of the code;
66 * What happened to the cleanup function calls? What if we have to, say, close a file? 63* What happened to the cleanup function calls? What if we have to, say, close a file?
67 * {{Assert}} messes with the error message before raising the error (it adds line number information). 64* `Assert` messes with the error message before raising the error (it adds line number information).
68 65
69Fortunately, all these problems are very easy to solve and that's what we do in the following sections. 66Fortunately, all these problems are very easy to solve and that's what we do in the following sections.
70 67
71== Introducing the {{protect}} factory == 68## Introducing the `protect` factory
72 69
73We used the {{pcall}} function to shield the user from errors that could be raised by the underlying implementation. Instead of directly using {{pcall}} (and thus duplicating code) every time we prefer a factory that does the same job: 70We used the `pcall` function to shield the user from errors that could be raised by the underlying implementation. Instead of directly using `pcall` (and thus duplicating code) every time we prefer a factory that does the same job:
74 {{{ 71```lua
75local function pack(ok, ...) 72local function pack(ok, ...)
76 return ok, {...} 73 return ok, {...}
77end 74end
@@ -83,19 +80,19 @@ function protect(f)
83 else return nil, ret[1] end 80 else return nil, ret[1] end
84 end 81 end
85end 82end
86}}} 83```
87 84
88The {{protect}} factory receives a function that might raise exceptions and returns a function that respects our return value convention. Now we can rewrite the top-level {{task}} function in a much cleaner way: 85The `protect` factory receives a function that might raise exceptions and returns a function that respects our return value convention. Now we can rewrite the top-level `task` function in a much cleaner way:
89 {{{ 86```lua
90task = protect(function(arg1, arg2, ...) 87task = protect(function(arg1, arg2, ...)
91 local ret1 = assert(task1(arg1)) 88 local ret1 = assert(task1(arg1))
92 local ret2 = assert(task2(arg2)) 89 local ret2 = assert(task2(arg2))
93 ... 90 ...
94end) 91end)
95}}} 92```
96 93
97The Lua implementation of the {{protect}} factory suffers with the creation of tables to hold multiple arguments and return values. It is possible (and easy) to implement the same function in C, without any table creation. 94The Lua implementation of the `protect` factory suffers with the creation of tables to hold multiple arguments and return values. It is possible (and easy) to implement the same function in C, without any table creation.
98 {{{ 95```c
99static int safecall(lua_State *L) { 96static int safecall(lua_State *L) {
100 lua_pushvalue(L, lua_upvalueindex(1)); 97 lua_pushvalue(L, lua_upvalueindex(1));
101 lua_insert(L, 1); 98 lua_insert(L, 1);
@@ -110,17 +107,17 @@ static int protect(lua_State *L) {
110 lua_pushcclosure(L, safecall, 1); 107 lua_pushcclosure(L, safecall, 1);
111 return 1; 108 return 1;
112} 109}
113}}} 110```
114 111
115===The {{newtry}} factory=== 112## The `newtry` factory
116 113
117Let's solve the two remaining issues with a single shot and use a concrete example to illustrate the proposed solution. Suppose you want to write a function to download an HTTP document. You have to connect, send the request and read the reply. Each of these tasks can fail, but if something goes wrong after you connected, you have to close the connection before returning the error message. 114Let's solve the two remaining issues with a single shot and use a concrete example to illustrate the proposed solution. Suppose you want to write a function to download an HTTP document. You have to connect, send the request and read the reply. Each of these tasks can fail, but if something goes wrong after you connected, you have to close the connection before returning the error message.
118 {{{ 115```lua
119get = protect(function(host, path) 116get = protect(function(host, path)
120 local c 117 local c
121 -- create a try function with a finalizer to close the socket 118 -- create a try function with a finalizer to close the socket
122 local try = newtry(function() 119 local try = newtry(function()
123 if c then c:close() end 120 if c then c:close() end
124 end) 121 end)
125 -- connect and send request 122 -- connect and send request
126 c = try(connect(host, 80)) 123 c = try(connect(host, 80))
@@ -137,34 +134,34 @@ get = protect(function(host, path)
137 c:close() 134 c:close()
138 return b, h 135 return b, h
139end) 136end)
140}}} 137```
141 138
142The {{newtry}} factory returns a function that works just like {{assert}}. The differences are that the {{try}} function doesn't mess with the error message and it calls an optional ''finalizer'' before raising the error. In our example, the finalizer simply closes the socket. 139The `newtry` factory returns a function that works just like `assert`. The differences are that the `try` function doesn't mess with the error message and it calls an optional "finalizer" before raising the error. In our example, the finalizer simply closes the socket.
143 140
144Even with a simple example like this, we see that the finalized exceptions simplified our life. Let's see what we gain in general, not just in this example: 141Even with a simple example like this, we see that the finalized exceptions simplified our life. Let's see what we gain in general, not just in this example:
145 142
146 * We don't need to declare dummy variables to hold error messages in case any ever shows up; 143* We don't need to declare dummy variables to hold error messages in case any ever shows up;
147 * We avoid using a variable to hold something that could either be a return value or an error message; 144* We avoid using a variable to hold something that could either be a return value or an error message;
148 * We didn't have to use several ''if'' statements to check for errors; 145* We didn't have to use several "if" statements to check for errors;
149 * If an error happens, we know our finalizer is going to be invoked automatically; 146* If an error happens, we know our finalizer is going to be invoked automatically;
150 * Exceptions get propagated, so we don't repeat these ''if'' statements until the error reaches the user. 147* Exceptions get propagated, so we don't repeat these "if" statements until the error reaches the user.
151 148
152Try writing the same function without the tricks we used above and you will see that the code gets ugly. Longer sequences of operations with error checking would get even uglier. So let's implement the {{newtry}} function in Lua: 149Try writing the same function without the tricks we used above and you will see that the code gets ugly. Longer sequences of operations with error checking would get even uglier. So let's implement the `newtry` function in Lua:
153 {{{ 150```lua
154function newtry(f) 151function newtry(f)
155 return function(...) 152 return function(...)
156 if not arg[1] then 153 if not arg[1] then
157 if f then f() end 154 if f then f() end
158 error(arg[2], 0) 155 error(arg[2], 0)
159 else 156 else
160 return ... 157 return ...
161 end 158 end
162 end 159 end
163end 160end
164}}} 161```
165 162
166Again, the implementation suffers from the creation of tables at each function call, so we prefer the C version: 163Again, the implementation suffers from the creation of tables at each function call, so we prefer the C version:
167 {{{ 164```lua
168static int finalize(lua_State *L) { 165static int finalize(lua_State *L) {
169 if (!lua_toboolean(L, 1)) { 166 if (!lua_toboolean(L, 1)) {
170 lua_pushvalue(L, lua_upvalueindex(1)); 167 lua_pushvalue(L, lua_upvalueindex(1));
@@ -182,13 +179,13 @@ static int do_nothing(lua_State *L) {
182 179
183static int newtry(lua_State *L) { 180static int newtry(lua_State *L) {
184 lua_settop(L, 1); 181 lua_settop(L, 1);
185 if (lua_isnil(L, 1)) 182 if (lua_isnil(L, 1))
186 lua_pushcfunction(L, do_nothing); 183 lua_pushcfunction(L, do_nothing);
187 lua_pushcclosure(L, finalize, 1); 184 lua_pushcclosure(L, finalize, 1);
188 return 1; 185 return 1;
189} 186}
190}}} 187```
191 188
192===Final considerations=== 189## Final considerations
193 190
194The {{protect}} and {{newtry}} functions saved a ''lot'' of work in the implementation of {{LuaSocket}}[http://www.tecgraf.puc-rio.br/luasocket]. The size of some modules was cut in half by the these ideas. It's true the scheme is not as generic as the exception mechanism of programming languages like C++ or Java, but the power/simplicity ratio is favorable and I hope it serves you as well as it served {{LuaSocket}}. 191The `protect` and `newtry` functions saved a "lot" of work in the implementation of [LuaSocket](https://github.com/lunarmodules/luasocket). The size of some modules was cut in half by the these ideas. It's true the scheme is not as generic as the exception mechanism of programming languages like C++ or Java, but the power/simplicity ratio is favorable and I hope it serves you as well as it served [LuaSocket](https://github.com/lunarmodules/luasocket).