LPegLabel - Parsing Expression Grammars (with Labels) for Lua
Introduction
LPegLabel is a conservative extension of the LPeg library that provides an implementation of Parsing Expression Grammars (PEGs) with labeled failures. Labels can be used to signal different kinds of erros and to specify which alternative in a labeled ordered choice should handle a given label. Labels can also be combined with the standard patterns of LPeg.
This document describes the new functions available in LpegLabel and presents some examples of usage. For a more detailed discussion about PEGs with labeled failures please see A Parsing Machine for Parsing Expression Grammars with Labeled Failures.
In LPegLabel, the result of an unsuccessful matching is a triple nil, lab, sfail, where lab is the label associated with the failure, and sfail is the suffix input being matched when lab was thrown. Below there is a brief summary of the new functions provided by LpegLabel:
Functions
lpeglabel.T(l)
Returns a pattern that throws the label l
.
A label must be an integer between 0 and 255.
The label 0 is equivalent to the regular failure of PEGs.
lpeglabel.Lc(p1, p2, l1, ..., ln)
Returns a pattern equivalent to a labeled ordered choice.
If the matching of p1
gives one of the labels l1, ..., ln
,
then the matching of p2
is tried from the same position. Otherwise,
the result of the matching of p1
is the pattern's result.
The labeled ordered choice lpeg.Lc(p1, p2, 0)
is equivalent to the
regular ordered choice p1 / p2
.
Although PEG's ordered choice is associative, the labeled ordered choice is not. When using this function, the user should take care to build a left-associative labeled ordered choice pattern.
lpeglabel.Rec(p1, p2 [, l1, ..., ln])
The recovery operator is similar to labeled order choice except
that the matching of p2
is tried from the failure position of p1
.
If no label is provided, the regular PEG failure is caught
i.e. lpeg.Rec(p1, p2)
is equivalent to lpeg.Rec(p1, p2, 0)
.
%{l}
Syntax of relabel module. Equivalent to lpeg.T(l)
.
p1 /{l1, ..., ln} p2
Syntax of relabel module. Equivalent to lpeg.Lc(p1, p2, l1, ..., ln)
.
The /{}
operator is left-associative.
A grammar can use both choice operators (/
and /{}
),
but a single choice can not mix them. That is, the parser of relabel
module will not recognize a pattern as p1 / p2 /{l1} p3
.
relabel.calcline (subject, i)
Returns line and column information regarding position i of the subject.
relabel.setlabels (tlabel)
Allows to specicify a table with labels. They keys of
tlabel
must be integers between 0 and 255,
and the associated values should be strings.
Examples
Below there a few examples of usage of LPegLabel. The code of these and of other examples is available in the examples directory.
Matching a list of identifiers separated by commas
The following example defines a grammar that matches a list of identifiers separated by commas. A label is thrown when there is an error matching an identifier or a comma:
local m = require'lpeglabel'
local re = require'relabel'
local g = m.P{
"S",
S = m.V"Id" * m.V"List",
List = -m.P(1) + (m.V"Comma" + m.T(2)) * (m.V"Id" + m.T(1)) * m.V"List",
Id = m.V"Sp" * m.R'az'^1,
Comma = m.V"Sp" * ",",
Sp = m.S" \n\t"^0,
}
function mymatch (g, s)
local r, e, sfail = g:match(s)
if not r then
local line, col = re.calcline(s, #s - #sfail)
local msg = "Error at line " .. line .. " (col " .. col .. ")"
if e == 1 then
return r, msg .. ": expecting an identifier before '" .. sfail .. "'"
elseif e == 2 then
return r, msg .. ": expecting ',' before '" .. sfail .. "'"
else
return r, msg
end
end
return r
end
print(mymatch(g, "one,two")) --> 8
print(mymatch(g, "one two")) --> nil Error at line 1 (col 3): expecting ',' before ' two'
print(mymatch(g, "one,\n two,\nthree,")) --> nil Error at line 3 (col 6): expecting an identifier before ''
In this example we could think about writing rule List as follows:
List = ((m.V"Comma" + m.T(2)) * (m.V"Id" + m.T(1)))^0,
but when matching this expression agains the end of input we would get a failure whose associated label would be 2, and this would cause the failure of the whole repetition.
Mnemonics instead of numbers
In the previous example we could have created a table with the error messages to improve the readbility of the PEG. Below we rewrite the previous grammar following this approach:
local m = require'lpeglabel'
local re = require'relabel'
local terror = {}
local function newError(s)
table.insert(terror, s)
return #terror
end
local errUndef = newError("undefined")
local errId = newError("expecting an identifier")
local errComma = newError("expecting ','")
local g = m.P{
"S",
S = m.V"Id" * m.V"List",
List = -m.P(1) + (m.V"Comma" + m.T(errComma)) * (m.V"Id" + m.T(errId)) * m.V"List",
Id = m.V"Sp" * m.R'az'^1,
Comma = m.V"Sp" * ",",
Sp = m.S" \n\t"^0,
}
function mymatch (g, s)
local r, e, sfail = g:match(s)
if not r then
local line, col = re.calcline(s, #s - #sfail)
local msg = "Error at line " .. line .. " (col " .. col .. "): "
return r, msg .. terror[e] .. " before '" .. sfail .. "'"
end
return r
end
print(mymatch(g, "one,two")) --> 8
print(mymatch(g, "one two")) --> nil Error at line 1 (col 3): expecting ',' before ' two'
print(mymatch(g, "one,\n two,\nthree,")) --> nil Error at line 3 (col 6): expecting an identifier before ''
relabel syntax
Now we rewrite the previous example using the syntax supported by relabel:
local re = require 'relabel'
local g = re.compile[[
S <- Id List
List <- !. / (',' / %{2}) (Id / %{1}) List
Id <- Sp [a-z]+
Comma <- Sp ','
Sp <- %s*
]]
function mymatch (g, s)
local r, e, sfail = g:match(s)
if not r then
local line, col = re.calcline(s, #s - #sfail)
local msg = "Error at line " .. line .. " (col " .. col .. ")"
if e == 1 then
return r, msg .. ": expecting an identifier before '" .. sfail .. "'"
elseif e == 2 then
return r, msg .. ": expecting ',' before '" .. sfail .. "'"
else
return r, msg
end
end
return r
end
print(mymatch(g, "one,two")) --> 8
print(mymatch(g, "one two")) --> nil Error at line 1 (col 3): expecting ',' before ' two'
print(mymatch(g, "one,\n two,\nthree,")) --> nil Error at line 3 (col 6): expecting an identifier before ''
With the help of function setlabels we can also rewrite the previous example to use mnemonic labels instead of plain numbers:
local re = require 'relabel'
local errinfo = {
{"errUndef", "undefined"},
{"errId", "expecting an identifier"},
{"errComma", "expecting ','"},
}
local errmsgs = {}
local labels = {}
for i, err in ipairs(errinfo) do
errmsgs[i] = err[2]
labels[err[1]] = i
end
re.setlabels(labels)
local g = re.compile[[
S <- Id List
List <- !. / (',' / %{errComma}) (Id / %{errId}) List
Id <- Sp [a-z]+
Comma <- Sp ','
Sp <- %s*
]]
function mymatch (g, s)
local r, e, sfail = g:match(s)
if not r then
local line, col = re.calcline(s, #s - #sfail)
local msg = "Error at line " .. line .. " (col " .. col .. "): "
return r, msg .. errmsgs[e] .. " before '" .. sfail .. "'"
end
return r
end
print(mymatch(g, "one,two")) --> 8
print(mymatch(g, "one two")) --> nil Error at line 1 (col 3): expecting ',' before ' two'
print(mymatch(g, "one,\n two,\nthree,")) --> nil Error at line 3 (col 6): expecting an identifier before ''
Arithmetic Expressions
Here's an example of an LPegLabel grammar that make its own function called 'expect', which takes a pattern and a label as parameters and throws the label if the pattern fails to be matched. This function can be extended later on to record all errors encountered once error recovery is implemented.
local lpeg = require"lpeglabel"
local R, S, P, V, C, Ct, T = lpeg.R, lpeg.S, lpeg.P, lpeg.V, lpeg.C, lpeg.Ct, lpeg.T
local labels = {
{"NoExp", "no expression found"},
{"Extra", "extra characters found after the expression"},
{"ExpTerm", "expected a term after the operator"},
{"ExpExp", "expected an expression after the parenthesis"},
{"MisClose", "missing a closing ')' after the expression"},
}
local function expect(patt, labname)
for i, elem in ipairs(labels) do
if elem[1] == labname then
return patt + T(i)
end
end
error("could not find label: " .. labname)
end
local num = R("09")^1 / tonumber
local op = S("+-*/")
local function compute(tokens)
local result = tokens[1]
for i = 2, #tokens, 2 do
if tokens[i] == '+' then
result = result + tokens[i+1]
elseif tokens[i] == '-' then
result = result - tokens[i+1]
elseif tokens[i] == '*' then
result = result * tokens[i+1]
elseif tokens[i] == '/' then
result = result / tokens[i+1]
else
error('unknown operation: ' .. tokens[i])
end
end
return result
end
local g = P {
"Exp",
Exp = Ct(V"Term" * (C(op) * expect(V"Term", "ExpTerm"))^0) / compute;
Term = num + V"Group";
Group = "(" * expect(V"Exp", "ExpExp") * expect(")", "MisClose");
}
g = expect(g, "NoExp") * expect(-P(1), "Extra")
local function eval(input)
local result, label, suffix = g:match(input)
if result ~= nil then
return result
else
local pos = input:len() - suffix:len() + 1
local msg = labels[label][2]
return nil, "syntax error: " .. msg .. " (at index " .. pos .. ")"
end
end
print(eval "98-76*(54/32)")
--> 37.125
print(eval "(1+1-1*2/2")
--> syntax error: missing a closing ')' after the expression (at index 11)
print(eval "(1+)-1*(2/2)")
--> syntax error: expected a term after the operator (at index 4)
print(eval "(1+1)-1*(/2)")
--> syntax error: expected an expression after the parenthesis (at index 10)
print(eval "1+(1-(1*2))/2x")
--> syntax error: extra chracters found after the expression (at index 14)
print(eval "-1+(1-(1*2))/2")
--> syntax error: no expression found (at index 1)
Catching labels
When a label is thrown, the grammar itself can handle this label by using the labeled ordered choice. Below we rewrite the example of the list of identifiers to show this feature:
local m = require'lpeglabel'
local terror = {}
local function newError(s)
table.insert(terror, s)
return #terror
end
local errUndef = newError("undefined")
local errId = newError("expecting an identifier")
local errComma = newError("expecting ','")
local g = m.P{
"S",
S = m.Lc(m.Lc(m.V"Id" * m.V"List", m.V"ErrId", errId),
m.V"ErrComma", errComma),
List = -m.P(1) + (m.V"Comma" + m.T(errComma)) * (m.V"Id" + m.T(errId)) * m.V"List",
Id = m.V"Sp" * m.R'az'^1,
Comma = m.V"Sp" * ",",
Sp = m.S" \n\t"^0,
ErrId = m.Cc(errId) / terror,
ErrComma = m.Cc(errComma) / terror
}
print(m.match(g, "one,two")) --> 8
print(m.match(g, "one two")) --> expecting ','
print(m.match(g, "one,\n two,\nthree,")) --> expecting an identifier
Error Recovery
By using labeled ordered choice or the recovery operator, when a label
is thrown, the parser may record the error and still continue parsing
to find more errors. We can even record the error right away without
actually throwing a label (relying on the regular PEG failure instead).
Below we rewrite the arithmetic expression example and modify
the expect
function to use the recovery operator for error recovery:
local lpeg = require"lpeglabel"
local R, S, P, V = lpeg.R, lpeg.S, lpeg.P, lpeg.V
local C, Cc, Ct, Cmt, Carg = lpeg.C, lpeg.Cc, lpeg.Ct, lpeg.Cmt, lpeg.Carg
local T, Lc, Rec = lpeg.T, lpeg.Lc, lpeg.Rec
local labels = {
{"NoExp", "no expression found"},
{"Extra", "extra characters found after the expression"},
{"ExpTerm", "expected a term after the operator"},
{"ExpExp", "expected an expression after the parenthesis"},
{"MisClose", "missing a closing ')' after the expression"},
}
local function labelindex(labname)
for i, elem in ipairs(labels) do
if elem[1] == labname then
return i
end
end
error("could not find label: " .. labname)
end
local function expect(patt, labname, recpatt)
local i = labelindex(labname)
local function recorderror(input, pos, errors)
table.insert(errors, {i, pos})
return true
end
if not recpatt then recpatt = P"" end
return Rec(patt, Cmt(Carg(1), recorderror) * recpatt)
end
local num = R("09")^1 / tonumber
local op = S("+-*/")
local function compute(tokens)
local result = tokens[1]
for i = 2, #tokens, 2 do
if tokens[i] == '+' then
result = result + tokens[i+1]
elseif tokens[i] == '-' then
result = result - tokens[i+1]
elseif tokens[i] == '*' then
result = result * tokens[i+1]
elseif tokens[i] == '/' then
result = result / tokens[i+1]
else
error('unknown operation: ' .. tokens[i])
end
end
return result
end
local g = P {
"Exp",
Exp = Ct(V"Term" * (C(op) * V"Operand")^0) / compute;
Operand = expect(V"Term", "ExpTerm", Cc(0));
Term = num + V"Group";
Group = "(" * V"InnerExp" * expect(")", "MisClose");
InnerExp = expect(V"Exp", "ExpExp", (P(1) - ")")^0 * Cc(0));
}
g = expect(g, "NoExp", P(1)^0) * expect(-P(1), "Extra")
local function eval(input)
local errors = {}
local result, label, suffix = g:match(input, 1, errors)
if #errors == 0 then
return result
else
local out = {}
for i, err in ipairs(errors) do
local pos = err[2]
local msg = labels[err[1]][2]
table.insert(out, "syntax error: " .. msg .. " (at index " .. pos .. ")")
end
return nil, table.concat(out, "\n")
end
end
print(eval "98-76*(54/32)")
--> 37.125
print(eval "-1+(1-(1*2))/2")
--> syntax error: no expression found (at index 1)
print(eval "(1+1-1*(2/2+)-():")
--> syntax error: expected a term after the operator (at index 13)
--> syntax error: expected an expression after the parenthesis (at index 16)
--> syntax error: missing a closing ')' after the expression (at index 17)
--> syntax error: extra characters found after the expression (at index 17)