From 7d43b367e7a89369c1302124677a305aa0d070c7 Mon Sep 17 00:00:00 2001 From: Roberto Ierusalimschy Date: Thu, 22 Jun 2023 10:51:31 -0300 Subject: Improved documentation for accumulator captures --- lpeg.html | 94 +++++++++++++++++++++++++++++++++++---------------------------- 1 file changed, 52 insertions(+), 42 deletions(-) (limited to 'lpeg.html') diff --git a/lpeg.html b/lpeg.html index c9bd9f9..5271a52 100644 --- a/lpeg.html +++ b/lpeg.html @@ -901,8 +901,8 @@ Creates an accumulator capture. This pattern behaves similarly to a function capture, with the following differences: -The last captured value is added as a first argument to -the call; +The last captured value before patt +is added as a first argument to the call; the return of the function is adjusted to one single value; that value replaces the last captured value. Note that the capture itself produces no values; @@ -911,31 +911,6 @@ it only changes the value of its previous capture.

As an example, -consider the following code fragment: -

-
-local name = lpeg.C(lpeg.R("az")^1)
-local p = name * (lpeg.P("^") % string.upper)^-1
-print(p:match("count"))    --> count
-print(p:match("count^"))   --> COUNT
-
-

-In the first match, -the accumulator capture does not match, -and so the match results in its first capture, a name. -In the second match, -the accumulator capture matches, -so the function string.upper -is called with the previous capture (created by name) -plus the string "^"; -the function ignores its second argument and returns the first argument -changed to upper case; -that value then becomes the first and only -capture value created by the match. -

- -

-As another example, let us consider the problem of adding a list of numbers.

@@ -956,22 +931,56 @@ First, the initial number captures a number;
 that first capture will play the role of an accumulator.
 Then, each time the sequence comma-number
 matches inside the loop there is an accumulator capture:
-It calls add with the current value of the accumulator
-and the value of the new number,
-and the result of the call (their sum) replaces the value of the accumulator.
+It calls add with the current value of the
+accumulator—which is the last captured value, created by the
+first number— and the value of the new number,
+and the result of the call (the sum of the two numbers)
+replaces the value of the accumulator.
 At the end of the match,
 the accumulator with all sums is the final value.
 

+

+As another example, +consider the following code fragment: +

+
+local name = lpeg.C(lpeg.R("az")^1)
+local p = name * (lpeg.P("^") % string.upper)^-1
+print(p:match("count"))    --> count
+print(p:match("count^"))   --> COUNT
+
+

+In the match against "count", +as there is no "^", +the optional accumulator capture does not match; +so, the match results in its sole capture, a name. +In the match against "count^", +the accumulator capture matches, +so the function string.upper +is called with the previous captured value (created by name) +plus the string "^"; +the function ignores its second argument and returns the first argument +changed to upper case; +that value then becomes the first and only +capture value created by the match. +

+

Due to the nature of this capture, you should avoid using it in places where it is not clear -what is its "previous" capture -(e.g., directly nested in a string capture -or a numbered capture). -Due to implementation details, +what is the "previous" capture, +such as directly nested in a string capture +or a numbered capture. +(Note that these captures may not need to evaluate +all their subcaptures to compute their results.) +Moreover, due to implementation details, you should not use this capture directly nested in a substitution capture. +A simple and effective way to avoid these issues is +to enclose the whole accumulation composition +(including the capture that generates the initial value) +into an anonymous group capture.

@@ -1056,7 +1065,8 @@ local name = lpeg.C(lpeg.alpha^1) * space local sep = lpeg.S(",;") * space local pair = name * "=" * space * name * sep^-1 local list = lpeg.Ct("") * (pair % rawset)^0 -t = list:match("a=b, c = hi; next = pi") --> { a = "b", c = "hi", next = "pi" } +t = list:match("a=b, c = hi; next = pi") + --> { a = "b", c = "hi", next = "pi" }

Each pair has the format name = name followed by @@ -1098,7 +1108,7 @@ by sep. If the split results in too many values, it may overflow the maximum number of values that can be returned by a Lua function. -In this case, +To avoid this problem, we can collect these values in a table:

@@ -1134,7 +1144,7 @@ end
 

This grammar has a straight reading: -it matches p or skips one character and tries again. +its sole rule matches p or skips one character and tries again.

@@ -1143,9 +1153,9 @@ If we want to know where the pattern is in the string we can add position captures to the pattern:

-local I = lpeg.Cp()
+local Cp = lpeg.Cp()
 function anywhere (p)
-  return lpeg.P{ I * p * I + 1 * lpeg.V(1) }
+  return lpeg.P{ Cp * p * Cp + 1 * lpeg.V(1) }
 end
 
 print(anywhere("world"):match("hello world!"))   --> 7   12
@@ -1155,15 +1165,15 @@ print(anywhere("world"):match("hello world!"))   --> 7   12
 Another option for the search is like this:
 

-local I = lpeg.Cp()
+local Cp = lpeg.Cp()
 function anywhere (p)
-  return (1 - lpeg.P(p))^0 * I * p * I
+  return (1 - lpeg.P(p))^0 * Cp * p * Cp
 end
 

Again the pattern has a straight reading: it skips as many characters as possible while not matching p, -and then matches p (plus appropriate captures). +and then matches p plus appropriate captures.

-- cgit v1.2.3-55-g6feb