aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRoberto Ierusalimschy <roberto@inf.puc-rio.br>2023-06-22 10:51:31 -0300
committerRoberto Ierusalimschy <roberto@inf.puc-rio.br>2023-06-22 10:51:31 -0300
commit7d43b367e7a89369c1302124677a305aa0d070c7 (patch)
treecc0a05cc02a417b9107fa68a9506f7afef667dcb
parent4eb4419163dd6c97665b9481e9581ff32496b392 (diff)
downloadlpeg-7d43b367e7a89369c1302124677a305aa0d070c7.tar.gz
lpeg-7d43b367e7a89369c1302124677a305aa0d070c7.tar.bz2
lpeg-7d43b367e7a89369c1302124677a305aa0d070c7.zip
Improved documentation for accumulator captures
-rw-r--r--lpcap.c2
-rw-r--r--lpeg.html94
-rw-r--r--re.html38
3 files changed, 75 insertions, 59 deletions
diff --git a/lpcap.c b/lpcap.c
index fca8cbb..f13ecf4 100644
--- a/lpcap.c
+++ b/lpcap.c
@@ -477,7 +477,7 @@ static int addonestring (luaL_Buffer *b, CapState *cs, const char *what) {
477 substcap(b, cs); /* add capture directly to buffer */ 477 substcap(b, cs); /* add capture directly to buffer */
478 return 1; 478 return 1;
479 case Cacc: /* accumulator capture? */ 479 case Cacc: /* accumulator capture? */
480 return luaL_error(cs->L, "accumulator capture inside substitution capture"); 480 return luaL_error(cs->L, "invalid context for an accumulator capture");
481 default: { 481 default: {
482 lua_State *L = cs->L; 482 lua_State *L = cs->L;
483 int n = pushcapture(cs); 483 int n = pushcapture(cs);
diff --git a/lpeg.html b/lpeg.html
index c9bd9f9..5271a52 100644
--- a/lpeg.html
+++ b/lpeg.html
@@ -901,8 +901,8 @@ Creates an <em>accumulator capture</em>.
901This pattern behaves similarly to a 901This pattern behaves similarly to a
902<a href="#cap-func">function capture</a>, 902<a href="#cap-func">function capture</a>,
903with the following differences: 903with the following differences:
904The last captured value is added as a first argument to 904The last captured value before <code>patt</code>
905the call; 905is added as a first argument to the call;
906the return of the function is adjusted to one single value; 906the return of the function is adjusted to one single value;
907that value replaces the last captured value. 907that value replaces the last captured value.
908Note that the capture itself produces no values; 908Note that the capture itself produces no values;
@@ -911,31 +911,6 @@ it only changes the value of its previous capture.
911 911
912<p> 912<p>
913As an example, 913As an example,
914consider the following code fragment:
915</p>
916<pre class="example">
917local name = lpeg.C(lpeg.R("az")^1)
918local p = name * (lpeg.P("^") % string.upper)^-1
919print(p:match("count")) --&gt; count
920print(p:match("count^")) --&gt; COUNT
921</pre>
922<p>
923In the first match,
924the accumulator capture does not match,
925and so the match results in its first capture, a name.
926In the second match,
927the accumulator capture matches,
928so the function <code>string.upper</code>
929is called with the previous capture (created by <code>name</code>)
930plus the string <code>"^"</code>;
931the function ignores its second argument and returns the first argument
932changed to upper case;
933that value then becomes the first and only
934capture value created by the match.
935</p>
936
937<p>
938As another example,
939let us consider the problem of adding a list of numbers. 914let us consider the problem of adding a list of numbers.
940</p> 915</p>
941<pre class="example"> 916<pre class="example">
@@ -956,22 +931,56 @@ First, the initial <code>number</code> captures a number;
956that first capture will play the role of an accumulator. 931that first capture will play the role of an accumulator.
957Then, each time the sequence <code>comma-number</code> 932Then, each time the sequence <code>comma-number</code>
958matches inside the loop there is an accumulator capture: 933matches inside the loop there is an accumulator capture:
959It calls <code>add</code> with the current value of the accumulator 934It calls <code>add</code> with the current value of the
960and the value of the new number, 935accumulator&mdash;which is the last captured value, created by the
961and the result of the call (their sum) replaces the value of the accumulator. 936first <code>number</code>&mdash; and the value of the new number,
937and the result of the call (the sum of the two numbers)
938replaces the value of the accumulator.
962At the end of the match, 939At the end of the match,
963the accumulator with all sums is the final value. 940the accumulator with all sums is the final value.
964</p> 941</p>
965 942
966<p> 943<p>
944As another example,
945consider the following code fragment:
946</p>
947<pre class="example">
948local name = lpeg.C(lpeg.R("az")^1)
949local p = name * (lpeg.P("^") % string.upper)^-1
950print(p:match("count")) --&gt; count
951print(p:match("count^")) --&gt; COUNT
952</pre>
953<p>
954In the match against <code>"count"</code>,
955as there is no <code>"^"</code>,
956the optional accumulator capture does not match;
957so, the match results in its sole capture, a name.
958In the match against <code>"count^"</code>,
959the accumulator capture matches,
960so the function <code>string.upper</code>
961is called with the previous captured value (created by <code>name</code>)
962plus the string <code>"^"</code>;
963the function ignores its second argument and returns the first argument
964changed to upper case;
965that value then becomes the first and only
966capture value created by the match.
967</p>
968
969<p>
967Due to the nature of this capture, 970Due to the nature of this capture,
968you should avoid using it in places where it is not clear 971you should avoid using it in places where it is not clear
969what is its "previous" capture 972what is the "previous" capture,
970(e.g., directly nested in a <a href="#cap-string">string capture</a> 973such as directly nested in a <a href="#cap-string">string capture</a>
971or a <a href="#cap-num">numbered capture</a>). 974or a <a href="#cap-num">numbered capture</a>.
972Due to implementation details, 975(Note that these captures may not need to evaluate
976all their subcaptures to compute their results.)
977Moreover, due to implementation details,
973you should not use this capture directly nested in a 978you should not use this capture directly nested in a
974<a href="#cap-s">substitution capture</a>. 979<a href="#cap-s">substitution capture</a>.
980A simple and effective way to avoid these issues is
981to enclose the whole accumulation composition
982(including the capture that generates the initial value)
983into an anonymous <a href="#cap-g">group capture</a>.
975</p> 984</p>
976 985
977 986
@@ -1056,7 +1065,8 @@ local name = lpeg.C(lpeg.alpha^1) * space
1056local sep = lpeg.S(",;") * space 1065local sep = lpeg.S(",;") * space
1057local pair = name * "=" * space * name * sep^-1 1066local pair = name * "=" * space * name * sep^-1
1058local list = lpeg.Ct("") * (pair % rawset)^0 1067local list = lpeg.Ct("") * (pair % rawset)^0
1059t = list:match("a=b, c = hi; next = pi") --&gt; { a = "b", c = "hi", next = "pi" } 1068t = list:match("a=b, c = hi; next = pi")
1069 --&gt; { a = "b", c = "hi", next = "pi" }
1060</pre> 1070</pre>
1061<p> 1071<p>
1062Each pair has the format <code>name = name</code> followed by 1072Each pair has the format <code>name = name</code> followed by
@@ -1098,7 +1108,7 @@ by <code>sep</code>.
1098If the split results in too many values, 1108If the split results in too many values,
1099it may overflow the maximum number of values 1109it may overflow the maximum number of values
1100that can be returned by a Lua function. 1110that can be returned by a Lua function.
1101In this case, 1111To avoid this problem,
1102we can collect these values in a table: 1112we can collect these values in a table:
1103</p> 1113</p>
1104<pre class="example"> 1114<pre class="example">
@@ -1134,7 +1144,7 @@ end
1134</pre> 1144</pre>
1135<p> 1145<p>
1136This grammar has a straight reading: 1146This grammar has a straight reading:
1137it matches <code>p</code> or skips one character and tries again. 1147its sole rule matches <code>p</code> or skips one character and tries again.
1138</p> 1148</p>
1139 1149
1140<p> 1150<p>
@@ -1143,9 +1153,9 @@ If we want to know where the pattern is in the string
1143we can add position captures to the pattern: 1153we can add position captures to the pattern:
1144</p> 1154</p>
1145<pre class="example"> 1155<pre class="example">
1146local I = lpeg.Cp() 1156local Cp = lpeg.Cp()
1147function anywhere (p) 1157function anywhere (p)
1148 return lpeg.P{ I * p * I + 1 * lpeg.V(1) } 1158 return lpeg.P{ Cp * p * Cp + 1 * lpeg.V(1) }
1149end 1159end
1150 1160
1151print(anywhere("world"):match("hello world!")) --&gt; 7 12 1161print(anywhere("world"):match("hello world!")) --&gt; 7 12
@@ -1155,15 +1165,15 @@ print(anywhere("world"):match("hello world!")) --&gt; 7 12
1155Another option for the search is like this: 1165Another option for the search is like this:
1156</p> 1166</p>
1157<pre class="example"> 1167<pre class="example">
1158local I = lpeg.Cp() 1168local Cp = lpeg.Cp()
1159function anywhere (p) 1169function anywhere (p)
1160 return (1 - lpeg.P(p))^0 * I * p * I 1170 return (1 - lpeg.P(p))^0 * Cp * p * Cp
1161end 1171end
1162</pre> 1172</pre>
1163<p> 1173<p>
1164Again the pattern has a straight reading: 1174Again the pattern has a straight reading:
1165it skips as many characters as possible while not matching <code>p</code>, 1175it skips as many characters as possible while not matching <code>p</code>,
1166and then matches <code>p</code> (plus appropriate captures). 1176and then matches <code>p</code> plus appropriate captures.
1167</p> 1177</p>
1168 1178
1169<p> 1179<p>
diff --git a/re.html b/re.html
index ed4ccb1..114d968 100644
--- a/re.html
+++ b/re.html
@@ -61,6 +61,20 @@ Constructions are listed in order of decreasing precedence.
61<table border="1"> 61<table border="1">
62<tbody><tr><td><b>Syntax</b></td><td><b>Description</b></td></tr> 62<tbody><tr><td><b>Syntax</b></td><td><b>Description</b></td></tr>
63<tr><td><code>( p )</code></td> <td>grouping</td></tr> 63<tr><td><code>( p )</code></td> <td>grouping</td></tr>
64<tr><td><code>&amp; p</code></td> <td>and predicate</td></tr>
65<tr><td><code>! p</code></td> <td>not predicate</td></tr>
66<tr><td><code>p1 p2</code></td> <td>concatenation</td></tr>
67<tr><td><code>p1 / p2</code></td> <td>ordered choice</td></tr>
68<tr><td><code>p ?</code></td> <td>optional match</td></tr>
69<tr><td><code>p *</code></td> <td>zero or more repetitions</td></tr>
70<tr><td><code>p +</code></td> <td>one or more repetitions</td></tr>
71<tr><td><code>p^num</code></td>
72 <td>exactly <code>num</code> repetitions</td></tr>
73<tr><td><code>p^+num</code></td>
74 <td>at least <code>num</code> repetitions</td></tr>
75<tr><td><code>p^-num</code></td>
76 <td>at most <code>num</code> repetitions</td></tr>
77<tr><td>(<code>name &lt;- p</code>)<sup>+</sup></td> <td>grammar</td></tr>
64<tr><td><code>'string'</code></td> <td>literal string</td></tr> 78<tr><td><code>'string'</code></td> <td>literal string</td></tr>
65<tr><td><code>"string"</code></td> <td>literal string</td></tr> 79<tr><td><code>"string"</code></td> <td>literal string</td></tr>
66<tr><td><code>[class]</code></td> <td>character class</td></tr> 80<tr><td><code>[class]</code></td> <td>character class</td></tr>
@@ -69,22 +83,15 @@ Constructions are listed in order of decreasing precedence.
69 <td>pattern <code>defs[name]</code> or a pre-defined pattern</td></tr> 83 <td>pattern <code>defs[name]</code> or a pre-defined pattern</td></tr>
70<tr><td><code>name</code></td><td>non terminal</td></tr> 84<tr><td><code>name</code></td><td>non terminal</td></tr>
71<tr><td><code>&lt;name&gt;</code></td><td>non terminal</td></tr> 85<tr><td><code>&lt;name&gt;</code></td><td>non terminal</td></tr>
86
72<tr><td><code>{}</code></td> <td>position capture</td></tr> 87<tr><td><code>{}</code></td> <td>position capture</td></tr>
73<tr><td><code>{ p }</code></td> <td>simple capture</td></tr> 88<tr><td><code>{ p }</code></td> <td>simple capture</td></tr>
74<tr><td><code>{: p :}</code></td> <td>anonymous group capture</td></tr> 89<tr><td><code>{: p :}</code></td> <td>anonymous group capture</td></tr>
75<tr><td><code>{:name: p :}</code></td> <td>named group capture</td></tr> 90<tr><td><code>{:name: p :}</code></td> <td>named group capture</td></tr>
76<tr><td><code>{~ p ~}</code></td> <td>substitution capture</td></tr> 91<tr><td><code>{~ p ~}</code></td> <td>substitution capture</td></tr>
77<tr><td><code>{| p |}</code></td> <td>table capture</td></tr> 92<tr><td><code>{| p |}</code></td> <td>table capture</td></tr>
78<tr><td><code>=name</code></td> <td>back reference 93<tr><td><code>=name</code></td> <td>back reference</td></tr>
79</td></tr> 94
80<tr><td><code>p ?</code></td> <td>optional match</td></tr>
81<tr><td><code>p *</code></td> <td>zero or more repetitions</td></tr>
82<tr><td><code>p +</code></td> <td>one or more repetitions</td></tr>
83<tr><td><code>p^num</code></td> <td>exactly <code>n</code> repetitions</td></tr>
84<tr><td><code>p^+num</code></td>
85 <td>at least <code>n</code> repetitions</td></tr>
86<tr><td><code>p^-num</code></td>
87 <td>at most <code>n</code> repetitions</td></tr>
88<tr><td><code>p -&gt; 'string'</code></td> <td>string capture</td></tr> 95<tr><td><code>p -&gt; 'string'</code></td> <td>string capture</td></tr>
89<tr><td><code>p -&gt; "string"</code></td> <td>string capture</td></tr> 96<tr><td><code>p -&gt; "string"</code></td> <td>string capture</td></tr>
90<tr><td><code>p -&gt; num</code></td> <td>numbered capture</td></tr> 97<tr><td><code>p -&gt; num</code></td> <td>numbered capture</td></tr>
@@ -94,11 +101,8 @@ equivalent to <code>p / defs[name]</code></td></tr>
94equivalent to <code>lpeg.Cmt(p, defs[name])</code></td></tr> 101equivalent to <code>lpeg.Cmt(p, defs[name])</code></td></tr>
95<tr><td><code>p ~&gt; name</code></td> <td>fold capture 102<tr><td><code>p ~&gt; name</code></td> <td>fold capture
96equivalent to <code>lpeg.Cf(p, defs[name])</code></td></tr> 103equivalent to <code>lpeg.Cf(p, defs[name])</code></td></tr>
97<tr><td><code>&amp; p</code></td> <td>and predicate</td></tr> 104<tr><td><code>p &gt;&gt; name</code></td> <td>accumulator capture
98<tr><td><code>! p</code></td> <td>not predicate</td></tr> 105equivalent to <code>(p % defs[name])</code></td></tr>
99<tr><td><code>p1 p2</code></td> <td>concatenation</td></tr>
100<tr><td><code>p1 / p2</code></td> <td>ordered choice</td></tr>
101<tr><td>(<code>name &lt;- p</code>)<sup>+</sup></td> <td>grammar</td></tr>
102</tbody></table> 106</tbody></table>
103<p> 107<p>
104Any space appearing in a syntax description can be 108Any space appearing in a syntax description can be
@@ -199,9 +203,10 @@ print(re.match("the number 423 is odd", "({%a+} / .)*"))
199--&gt; the number is odd 203--&gt; the number is odd
200 204
201-- returns the first numeral in a string 205-- returns the first numeral in a string
202print(re.match("the number 423 is odd", "s <- {%d+} / . s")) 206print(re.match("the number 423 is odd", "s &lt;- {%d+} / . s"))
203--&gt; 423 207--&gt; 423
204 208
209-- substitutes a dot for each vowel in a string
205print(re.gsub("hello World", "[aeiou]", ".")) 210print(re.gsub("hello World", "[aeiou]", "."))
206--&gt; h.ll. W.rld 211--&gt; h.ll. W.rld
207</pre> 212</pre>
@@ -415,6 +420,7 @@ prefix &lt;- '&amp;' S prefix / '!' S prefix / suffix
415suffix &lt;- primary S (([+*?] 420suffix &lt;- primary S (([+*?]
416 / '^' [+-]? num 421 / '^' [+-]? num
417 / '-&gt;' S (string / '{}' / name) 422 / '-&gt;' S (string / '{}' / name)
423 / '&gt&gt;' S name
418 / '=&gt;' S name) S)* 424 / '=&gt;' S name) S)*
419 425
420primary &lt;- '(' exp ')' / string / class / defined 426primary &lt;- '(' exp ')' / string / class / defined