LPegLabel

## LPegLabel - Parsing Expression Grammars (with Labels) for Lua --- ### Introduction LPegLabel is a conservative extension of the [LPeg](http://www.inf.puc-rio.br/~roberto/lpeg) library that provides an implementation of Parsing Expression Grammars (PEGs) with labeled failures. Labels can be used to signal different kinds of erros and to specify which alternative in a labeled ordered choice should handle a given label. Labels can also be combined with the standard patterns of LPeg. This document describes the new functions available in LpegLabel and presents some examples of usage. For a more detailed discussion about PEGs with labeled failures please see [A Parsing Machine for Parsing Expression Grammars with Labeled Failures](https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxzcW1lZGVpcm9zfGd4OjMzZmE3YzM0Y2E2MGM5Y2M). In LPegLabel, the result of an unsuccessful matching is a triple **nil, lab, sfail**, where **lab** is the label associated with the failure, and **sfail** is the suffix input being matched when **lab** was thrown. Below there is a brief summary of the new functions provided by LpegLabel:
FunctionDescription
lpeglabel.T (l) Throws label l
lpeglabel.Lc (p1, p2, l1, ..., ln) Matches p1 and tries to match p2 if the matching of p1 gives one of l1, ..., ln
lpeglabel.Rec (p1, p2 [, l1, ..., ln]) Like Lc but does not reset the position of the parser when trying p2. By default, it catches regular PEG failures
%{l} Syntax of relabel module. Equivalent to lpeg.T(l)
p1 /{l1, ..., ln} p2 Syntax of relabel module. Equivalent to lpeg.Lc(p1, p2, l1, ..., ln)
relabel.calcline(subject, i) Calculates line and column information regarding position i of the subject
relabel.setlabels (tlabel) Allows to specicify a table with mnemonic labels.
### Functions #### lpeglabel.T(l) Returns a pattern that throws the label `l`. A label must be an integer between 0 and 255. The label 0 is equivalent to the regular failure of PEGs. #### lpeglabel.Lc(p1, p2, l1, ..., ln) Returns a pattern equivalent to a *labeled ordered choice*. If the matching of `p1` gives one of the labels `l1, ..., ln`, then the matching of `p2` is tried from the same position. Otherwise, the result of the matching of `p1` is the pattern's result. The labeled ordered choice `lpeg.Lc(p1, p2, 0)` is equivalent to the regular ordered choice `p1 / p2`. Although PEG's ordered choice is associative, the labeled ordered choice is not. When using this function, the user should take care to build a left-associative labeled ordered choice pattern. #### lpeglabel.Rec(p1, p2 [, l1, ..., ln]) The *recovery operator* is similar to labeled order choice except that the matching of `p2` is tried from the failure position of `p1`. If no label is provided, the regular PEG failure is caught i.e. `lpeg.Rec(p1, p2)` is equivalent to `lpeg.Rec(p1, p2, 0)`. #### %{l} Syntax of *relabel* module. Equivalent to `lpeg.T(l)`. #### p1 /{l1, ..., ln} p2 Syntax of *relabel* module. Equivalent to `lpeg.Lc(p1, p2, l1, ..., ln)`. The `/{}` operator is left-associative. A grammar can use both choice operators (`/` and `/{}`), but a single choice can not mix them. That is, the parser of `relabel` module will not recognize a pattern as `p1 / p2 /{l1} p3`. #### relabel.calcline (subject, i) Returns line and column information regarding position i of the subject. #### relabel.setlabels (tlabel) Allows to specicify a table with labels. They keys of `tlabel` must be integers between 0 and 255, and the associated values should be strings. ### Examples Below there a few examples of usage of LPegLabel. The code of these and of other examples is available in the *examples* directory. #### Matching a list of identifiers separated by commas The following example defines a grammar that matches a list of identifiers separated by commas. A label is thrown when there is an error matching an identifier or a comma: ```lua local m = require'lpeglabel' local re = require'relabel' local g = m.P{ "S", S = m.V"Id" * m.V"List", List = -m.P(1) + (m.V"Comma" + m.T(2)) * (m.V"Id" + m.T(1)) * m.V"List", Id = m.V"Sp" * m.R'az'^1, Comma = m.V"Sp" * ",", Sp = m.S" \n\t"^0, } function mymatch (g, s) local r, e, sfail = g:match(s) if not r then local line, col = re.calcline(s, #s - #sfail) local msg = "Error at line " .. line .. " (col " .. col .. ")" if e == 1 then return r, msg .. ": expecting an identifier before '" .. sfail .. "'" elseif e == 2 then return r, msg .. ": expecting ',' before '" .. sfail .. "'" else return r, msg end end return r end print(mymatch(g, "one,two")) --> 8 print(mymatch(g, "one two")) --> nil Error at line 1 (col 3): expecting ',' before ' two' print(mymatch(g, "one,\n two,\nthree,")) --> nil Error at line 3 (col 6): expecting an identifier before '' ``` In this example we could think about writing rule List as follows: ```lua List = ((m.V"Comma" + m.T(2)) * (m.V"Id" + m.T(1)))^0, ``` but when matching this expression agains the end of input we would get a failure whose associated label would be **2**, and this would cause the failure of the *whole* repetition. ##### Mnemonics instead of numbers In the previous example we could have created a table with the error messages to improve the readbility of the PEG. Below we rewrite the previous grammar following this approach: ```lua local m = require'lpeglabel' local re = require'relabel' local terror = {} local function newError(s) table.insert(terror, s) return #terror end local errUndef = newError("undefined") local errId = newError("expecting an identifier") local errComma = newError("expecting ','") local g = m.P{ "S", S = m.V"Id" * m.V"List", List = -m.P(1) + (m.V"Comma" + m.T(errComma)) * (m.V"Id" + m.T(errId)) * m.V"List", Id = m.V"Sp" * m.R'az'^1, Comma = m.V"Sp" * ",", Sp = m.S" \n\t"^0, } function mymatch (g, s) local r, e, sfail = g:match(s) if not r then local line, col = re.calcline(s, #s - #sfail) local msg = "Error at line " .. line .. " (col " .. col .. "): " return r, msg .. terror[e] .. " before '" .. sfail .. "'" end return r end print(mymatch(g, "one,two")) --> 8 print(mymatch(g, "one two")) --> nil Error at line 1 (col 3): expecting ',' before ' two' print(mymatch(g, "one,\n two,\nthree,")) --> nil Error at line 3 (col 6): expecting an identifier before '' ``` ##### *relabel* syntax Now we rewrite the previous example using the syntax supported by *relabel*: ```lua local re = require 'relabel' local g = re.compile[[ S <- Id List List <- !. / (',' / %{2}) (Id / %{1}) List Id <- Sp [a-z]+ Comma <- Sp ',' Sp <- %s* ]] function mymatch (g, s) local r, e, sfail = g:match(s) if not r then local line, col = re.calcline(s, #s - #sfail) local msg = "Error at line " .. line .. " (col " .. col .. ")" if e == 1 then return r, msg .. ": expecting an identifier before '" .. sfail .. "'" elseif e == 2 then return r, msg .. ": expecting ',' before '" .. sfail .. "'" else return r, msg end end return r end print(mymatch(g, "one,two")) --> 8 print(mymatch(g, "one two")) --> nil Error at line 1 (col 3): expecting ',' before ' two' print(mymatch(g, "one,\n two,\nthree,")) --> nil Error at line 3 (col 6): expecting an identifier before '' ``` With the help of function *setlabels* we can also rewrite the previous example to use mnemonic labels instead of plain numbers: ```lua local re = require 'relabel' local errinfo = { {"errUndef", "undefined"}, {"errId", "expecting an identifier"}, {"errComma", "expecting ','"}, } local errmsgs = {} local labels = {} for i, err in ipairs(errinfo) do errmsgs[i] = err[2] labels[err[1]] = i end re.setlabels(labels) local g = re.compile[[ S <- Id List List <- !. / (',' / %{errComma}) (Id / %{errId}) List Id <- Sp [a-z]+ Comma <- Sp ',' Sp <- %s* ]] function mymatch (g, s) local r, e, sfail = g:match(s) if not r then local line, col = re.calcline(s, #s - #sfail) local msg = "Error at line " .. line .. " (col " .. col .. "): " return r, msg .. errmsgs[e] .. " before '" .. sfail .. "'" end return r end print(mymatch(g, "one,two")) --> 8 print(mymatch(g, "one two")) --> nil Error at line 1 (col 3): expecting ',' before ' two' print(mymatch(g, "one,\n two,\nthree,")) --> nil Error at line 3 (col 6): expecting an identifier before '' ``` #### Arithmetic Expressions Here's an example of an LPegLabel grammar that make its own function called 'expect', which takes a pattern and a label as parameters and throws the label if the pattern fails to be matched. This function can be extended later on to record all errors encountered once error recovery is implemented. ```lua local lpeg = require"lpeglabel" local R, S, P, V, C, Ct, T = lpeg.R, lpeg.S, lpeg.P, lpeg.V, lpeg.C, lpeg.Ct, lpeg.T local labels = { {"NoExp", "no expression found"}, {"Extra", "extra characters found after the expression"}, {"ExpTerm", "expected a term after the operator"}, {"ExpExp", "expected an expression after the parenthesis"}, {"MisClose", "missing a closing ')' after the expression"}, } local function expect(patt, labname) for i, elem in ipairs(labels) do if elem[1] == labname then return patt + T(i) end end error("could not find label: " .. labname) end local num = R("09")^1 / tonumber local op = S("+-*/") local function compute(tokens) local result = tokens[1] for i = 2, #tokens, 2 do if tokens[i] == '+' then result = result + tokens[i+1] elseif tokens[i] == '-' then result = result - tokens[i+1] elseif tokens[i] == '*' then result = result * tokens[i+1] elseif tokens[i] == '/' then result = result / tokens[i+1] else error('unknown operation: ' .. tokens[i]) end end return result end local g = P { "Exp", Exp = Ct(V"Term" * (C(op) * expect(V"Term", "ExpTerm"))^0) / compute; Term = num + V"Group"; Group = "(" * expect(V"Exp", "ExpExp") * expect(")", "MisClose"); } g = expect(g, "NoExp") * expect(-P(1), "Extra") local function eval(input) local result, label, suffix = g:match(input) if result ~= nil then return result else local pos = input:len() - suffix:len() + 1 local msg = labels[label][2] return nil, "syntax error: " .. msg .. " (at index " .. pos .. ")" end end print(eval "98-76*(54/32)") --> 37.125 print(eval "(1+1-1*2/2") --> syntax error: missing a closing ')' after the expression (at index 11) print(eval "(1+)-1*(2/2)") --> syntax error: expected a term after the operator (at index 4) print(eval "(1+1)-1*(/2)") --> syntax error: expected an expression after the parenthesis (at index 10) print(eval "1+(1-(1*2))/2x") --> syntax error: extra chracters found after the expression (at index 14) print(eval "-1+(1-(1*2))/2") --> syntax error: no expression found (at index 1) ``` #### Catching labels When a label is thrown, the grammar itself can handle this label by using the labeled ordered choice. Below we rewrite the example of the list of identifiers to show this feature: ```lua local m = require'lpeglabel' local terror = {} local function newError(s) table.insert(terror, s) return #terror end local errUndef = newError("undefined") local errId = newError("expecting an identifier") local errComma = newError("expecting ','") local g = m.P{ "S", S = m.Lc(m.Lc(m.V"Id" * m.V"List", m.V"ErrId", errId), m.V"ErrComma", errComma), List = -m.P(1) + (m.V"Comma" + m.T(errComma)) * (m.V"Id" + m.T(errId)) * m.V"List", Id = m.V"Sp" * m.R'az'^1, Comma = m.V"Sp" * ",", Sp = m.S" \n\t"^0, ErrId = m.Cc(errId) / terror, ErrComma = m.Cc(errComma) / terror } print(m.match(g, "one,two")) --> 8 print(m.match(g, "one two")) --> expecting ',' print(m.match(g, "one,\n two,\nthree,")) --> expecting an identifier ``` #### Error Recovery By using labeled ordered choice or the recovery operator, when a label is thrown, the parser may record the error and still continue parsing to find more errors. We can even record the error right away without actually throwing a label (relying on the regular PEG failure instead). Below we rewrite the arithmetic expression example and modify the `expect` function to use the recovery operator for error recovery: ```lua local lpeg = require"lpeglabel" local R, S, P, V = lpeg.R, lpeg.S, lpeg.P, lpeg.V local C, Cc, Ct, Cmt, Carg = lpeg.C, lpeg.Cc, lpeg.Ct, lpeg.Cmt, lpeg.Carg local T, Lc, Rec = lpeg.T, lpeg.Lc, lpeg.Rec local labels = { {"NoExp", "no expression found"}, {"Extra", "extra characters found after the expression"}, {"ExpTerm", "expected a term after the operator"}, {"ExpExp", "expected an expression after the parenthesis"}, {"MisClose", "missing a closing ')' after the expression"}, } local function labelindex(labname) for i, elem in ipairs(labels) do if elem[1] == labname then return i end end error("could not find label: " .. labname) end local function expect(patt, labname, recpatt) local i = labelindex(labname) local function recorderror(input, pos, errors) table.insert(errors, {i, pos}) return true end if not recpatt then recpatt = P"" end return Rec(patt, Cmt(Carg(1), recorderror) * recpatt) end local num = R("09")^1 / tonumber local op = S("+-*/") local function compute(tokens) local result = tokens[1] for i = 2, #tokens, 2 do if tokens[i] == '+' then result = result + tokens[i+1] elseif tokens[i] == '-' then result = result - tokens[i+1] elseif tokens[i] == '*' then result = result * tokens[i+1] elseif tokens[i] == '/' then result = result / tokens[i+1] else error('unknown operation: ' .. tokens[i]) end end return result end local g = P { "Exp", Exp = Ct(V"Term" * (C(op) * V"Operand")^0) / compute; Operand = expect(V"Term", "ExpTerm", Cc(0)); Term = num + V"Group"; Group = "(" * V"InnerExp" * expect(")", "MisClose"); InnerExp = expect(V"Exp", "ExpExp", (P(1) - ")")^0 * Cc(0)); } g = expect(g, "NoExp", P(1)^0) * expect(-P(1), "Extra") local function eval(input) local errors = {} local result, label, suffix = g:match(input, 1, errors) if #errors == 0 then return result else local out = {} for i, err in ipairs(errors) do local pos = err[2] local msg = labels[err[1]][2] table.insert(out, "syntax error: " .. msg .. " (at index " .. pos .. ")") end return nil, table.concat(out, "\n") end end print(eval "98-76*(54/32)") --> 37.125 print(eval "-1+(1-(1*2))/2") --> syntax error: no expression found (at index 1) print(eval "(1+1-1*(2/2+)-():") --> syntax error: expected a term after the operator (at index 13) --> syntax error: expected an expression after the parenthesis (at index 16) --> syntax error: missing a closing ')' after the expression (at index 17) --> syntax error: extra characters found after the expression (at index 17) ```