diff options
| author | Roberto Ierusalimschy <roberto@inf.puc-rio.br> | 2023-06-22 10:51:31 -0300 |
|---|---|---|
| committer | Roberto Ierusalimschy <roberto@inf.puc-rio.br> | 2023-06-22 10:51:31 -0300 |
| commit | 7d43b367e7a89369c1302124677a305aa0d070c7 (patch) | |
| tree | cc0a05cc02a417b9107fa68a9506f7afef667dcb /lpeg.html | |
| parent | 4eb4419163dd6c97665b9481e9581ff32496b392 (diff) | |
| download | lpeg-7d43b367e7a89369c1302124677a305aa0d070c7.tar.gz lpeg-7d43b367e7a89369c1302124677a305aa0d070c7.tar.bz2 lpeg-7d43b367e7a89369c1302124677a305aa0d070c7.zip | |
Improved documentation for accumulator captures
Diffstat (limited to 'lpeg.html')
| -rw-r--r-- | lpeg.html | 94 |
1 files changed, 52 insertions, 42 deletions
| @@ -901,8 +901,8 @@ Creates an <em>accumulator capture</em>. | |||
| 901 | This pattern behaves similarly to a | 901 | This pattern behaves similarly to a |
| 902 | <a href="#cap-func">function capture</a>, | 902 | <a href="#cap-func">function capture</a>, |
| 903 | with the following differences: | 903 | with the following differences: |
| 904 | The last captured value is added as a first argument to | 904 | The last captured value before <code>patt</code> |
| 905 | the call; | 905 | is added as a first argument to the call; |
| 906 | the return of the function is adjusted to one single value; | 906 | the return of the function is adjusted to one single value; |
| 907 | that value replaces the last captured value. | 907 | that value replaces the last captured value. |
| 908 | Note that the capture itself produces no values; | 908 | Note that the capture itself produces no values; |
| @@ -911,31 +911,6 @@ it only changes the value of its previous capture. | |||
| 911 | 911 | ||
| 912 | <p> | 912 | <p> |
| 913 | As an example, | 913 | As an example, |
| 914 | consider the following code fragment: | ||
| 915 | </p> | ||
| 916 | <pre class="example"> | ||
| 917 | local name = lpeg.C(lpeg.R("az")^1) | ||
| 918 | local p = name * (lpeg.P("^") % string.upper)^-1 | ||
| 919 | print(p:match("count")) --> count | ||
| 920 | print(p:match("count^")) --> COUNT | ||
| 921 | </pre> | ||
| 922 | <p> | ||
| 923 | In the first match, | ||
| 924 | the accumulator capture does not match, | ||
| 925 | and so the match results in its first capture, a name. | ||
| 926 | In the second match, | ||
| 927 | the accumulator capture matches, | ||
| 928 | so the function <code>string.upper</code> | ||
| 929 | is called with the previous capture (created by <code>name</code>) | ||
| 930 | plus the string <code>"^"</code>; | ||
| 931 | the function ignores its second argument and returns the first argument | ||
| 932 | changed to upper case; | ||
| 933 | that value then becomes the first and only | ||
| 934 | capture value created by the match. | ||
| 935 | </p> | ||
| 936 | |||
| 937 | <p> | ||
| 938 | As another example, | ||
| 939 | let us consider the problem of adding a list of numbers. | 914 | let us consider the problem of adding a list of numbers. |
| 940 | </p> | 915 | </p> |
| 941 | <pre class="example"> | 916 | <pre class="example"> |
| @@ -956,22 +931,56 @@ First, the initial <code>number</code> captures a number; | |||
| 956 | that first capture will play the role of an accumulator. | 931 | that first capture will play the role of an accumulator. |
| 957 | Then, each time the sequence <code>comma-number</code> | 932 | Then, each time the sequence <code>comma-number</code> |
| 958 | matches inside the loop there is an accumulator capture: | 933 | matches inside the loop there is an accumulator capture: |
| 959 | It calls <code>add</code> with the current value of the accumulator | 934 | It calls <code>add</code> with the current value of the |
| 960 | and the value of the new number, | 935 | accumulator—which is the last captured value, created by the |
| 961 | and the result of the call (their sum) replaces the value of the accumulator. | 936 | first <code>number</code>— and the value of the new number, |
| 937 | and the result of the call (the sum of the two numbers) | ||
| 938 | replaces the value of the accumulator. | ||
| 962 | At the end of the match, | 939 | At the end of the match, |
| 963 | the accumulator with all sums is the final value. | 940 | the accumulator with all sums is the final value. |
| 964 | </p> | 941 | </p> |
| 965 | 942 | ||
| 966 | <p> | 943 | <p> |
| 944 | As another example, | ||
| 945 | consider the following code fragment: | ||
| 946 | </p> | ||
| 947 | <pre class="example"> | ||
| 948 | local name = lpeg.C(lpeg.R("az")^1) | ||
| 949 | local p = name * (lpeg.P("^") % string.upper)^-1 | ||
| 950 | print(p:match("count")) --> count | ||
| 951 | print(p:match("count^")) --> COUNT | ||
| 952 | </pre> | ||
| 953 | <p> | ||
| 954 | In the match against <code>"count"</code>, | ||
| 955 | as there is no <code>"^"</code>, | ||
| 956 | the optional accumulator capture does not match; | ||
| 957 | so, the match results in its sole capture, a name. | ||
| 958 | In the match against <code>"count^"</code>, | ||
| 959 | the accumulator capture matches, | ||
| 960 | so the function <code>string.upper</code> | ||
| 961 | is called with the previous captured value (created by <code>name</code>) | ||
| 962 | plus the string <code>"^"</code>; | ||
| 963 | the function ignores its second argument and returns the first argument | ||
| 964 | changed to upper case; | ||
| 965 | that value then becomes the first and only | ||
| 966 | capture value created by the match. | ||
| 967 | </p> | ||
| 968 | |||
| 969 | <p> | ||
| 967 | Due to the nature of this capture, | 970 | Due to the nature of this capture, |
| 968 | you should avoid using it in places where it is not clear | 971 | you should avoid using it in places where it is not clear |
| 969 | what is its "previous" capture | 972 | what is the "previous" capture, |
| 970 | (e.g., directly nested in a <a href="#cap-string">string capture</a> | 973 | such as directly nested in a <a href="#cap-string">string capture</a> |
| 971 | or a <a href="#cap-num">numbered capture</a>). | 974 | or a <a href="#cap-num">numbered capture</a>. |
| 972 | Due to implementation details, | 975 | (Note that these captures may not need to evaluate |
| 976 | all their subcaptures to compute their results.) | ||
| 977 | Moreover, due to implementation details, | ||
| 973 | you should not use this capture directly nested in a | 978 | you should not use this capture directly nested in a |
| 974 | <a href="#cap-s">substitution capture</a>. | 979 | <a href="#cap-s">substitution capture</a>. |
| 980 | A simple and effective way to avoid these issues is | ||
| 981 | to enclose the whole accumulation composition | ||
| 982 | (including the capture that generates the initial value) | ||
| 983 | into an anonymous <a href="#cap-g">group capture</a>. | ||
| 975 | </p> | 984 | </p> |
| 976 | 985 | ||
| 977 | 986 | ||
| @@ -1056,7 +1065,8 @@ local name = lpeg.C(lpeg.alpha^1) * space | |||
| 1056 | local sep = lpeg.S(",;") * space | 1065 | local sep = lpeg.S(",;") * space |
| 1057 | local pair = name * "=" * space * name * sep^-1 | 1066 | local pair = name * "=" * space * name * sep^-1 |
| 1058 | local list = lpeg.Ct("") * (pair % rawset)^0 | 1067 | local list = lpeg.Ct("") * (pair % rawset)^0 |
| 1059 | t = list:match("a=b, c = hi; next = pi") --> { a = "b", c = "hi", next = "pi" } | 1068 | t = list:match("a=b, c = hi; next = pi") |
| 1069 | --> { a = "b", c = "hi", next = "pi" } | ||
| 1060 | </pre> | 1070 | </pre> |
| 1061 | <p> | 1071 | <p> |
| 1062 | Each pair has the format <code>name = name</code> followed by | 1072 | Each pair has the format <code>name = name</code> followed by |
| @@ -1098,7 +1108,7 @@ by <code>sep</code>. | |||
| 1098 | If the split results in too many values, | 1108 | If the split results in too many values, |
| 1099 | it may overflow the maximum number of values | 1109 | it may overflow the maximum number of values |
| 1100 | that can be returned by a Lua function. | 1110 | that can be returned by a Lua function. |
| 1101 | In this case, | 1111 | To avoid this problem, |
| 1102 | we can collect these values in a table: | 1112 | we can collect these values in a table: |
| 1103 | </p> | 1113 | </p> |
| 1104 | <pre class="example"> | 1114 | <pre class="example"> |
| @@ -1134,7 +1144,7 @@ end | |||
| 1134 | </pre> | 1144 | </pre> |
| 1135 | <p> | 1145 | <p> |
| 1136 | This grammar has a straight reading: | 1146 | This grammar has a straight reading: |
| 1137 | it matches <code>p</code> or skips one character and tries again. | 1147 | its sole rule matches <code>p</code> or skips one character and tries again. |
| 1138 | </p> | 1148 | </p> |
| 1139 | 1149 | ||
| 1140 | <p> | 1150 | <p> |
| @@ -1143,9 +1153,9 @@ If we want to know where the pattern is in the string | |||
| 1143 | we can add position captures to the pattern: | 1153 | we can add position captures to the pattern: |
| 1144 | </p> | 1154 | </p> |
| 1145 | <pre class="example"> | 1155 | <pre class="example"> |
| 1146 | local I = lpeg.Cp() | 1156 | local Cp = lpeg.Cp() |
| 1147 | function anywhere (p) | 1157 | function anywhere (p) |
| 1148 | return lpeg.P{ I * p * I + 1 * lpeg.V(1) } | 1158 | return lpeg.P{ Cp * p * Cp + 1 * lpeg.V(1) } |
| 1149 | end | 1159 | end |
| 1150 | 1160 | ||
| 1151 | print(anywhere("world"):match("hello world!")) --> 7 12 | 1161 | print(anywhere("world"):match("hello world!")) --> 7 12 |
| @@ -1155,15 +1165,15 @@ print(anywhere("world"):match("hello world!")) --> 7 12 | |||
| 1155 | Another option for the search is like this: | 1165 | Another option for the search is like this: |
| 1156 | </p> | 1166 | </p> |
| 1157 | <pre class="example"> | 1167 | <pre class="example"> |
| 1158 | local I = lpeg.Cp() | 1168 | local Cp = lpeg.Cp() |
| 1159 | function anywhere (p) | 1169 | function anywhere (p) |
| 1160 | return (1 - lpeg.P(p))^0 * I * p * I | 1170 | return (1 - lpeg.P(p))^0 * Cp * p * Cp |
| 1161 | end | 1171 | end |
| 1162 | </pre> | 1172 | </pre> |
| 1163 | <p> | 1173 | <p> |
| 1164 | Again the pattern has a straight reading: | 1174 | Again the pattern has a straight reading: |
| 1165 | it skips as many characters as possible while not matching <code>p</code>, | 1175 | it skips as many characters as possible while not matching <code>p</code>, |
| 1166 | and then matches <code>p</code> (plus appropriate captures). | 1176 | and then matches <code>p</code> plus appropriate captures. |
| 1167 | </p> | 1177 | </p> |
| 1168 | 1178 | ||
| 1169 | <p> | 1179 | <p> |
