LPegLabel
Parsing Expression Grammars For Lua with Labels, version 0.1

Introduction

LPegLabel is an extension of the LPeg library that provides an implementation of Parsing Expression Grammars (PEGs) with labeled failures. Labels can be used to signal different kinds of erros and to specify which alternative in a labeled ordered choice should handle a given label. Labels can also be combined with the standard patterns of LPeg.

This document describes the new functions available in LpegLabel and presents some examples of usage. For a more detailed and formal discussion about PEGs with labels please see Exception Handling for Error Reporting in Parsing Expression Grammars and Error Reporting in Parsing Expression Grammars.

Below there is a brief summary of the new functions provided by LpegLabel:

FunctionDescription
lpeglabel.T (l1, ..., ln) Throws labels l1, ..., ln
lpeglabel.Lc (p1, p2, l1, ..., ln) Matches p1 and tries to match p2 if the matching of p1 gives one of l1, ..., ln
%{l1, ..., ln} Syntax of relabel module. Equivalent to lpeg.T(l1, ..., ln)
p1 /{l1, ..., ln} p2 Syntax of relabel module. Equivalent to lpeg.Lc(p1, p2, l1, ..., ln)
relabel.setlabels (tlabel) Allows to specicify a table with mnemonic labels.

In case of an unsucessful matching, the match function now returns nil plus a list of labels. These labels may be used to build a good error message.

Functions

lpeglabel.T(l1, ..., ln)

Returns a pattern that throws the list of labels l1, ..., ln. A label must be an integer between 0 and 31. The label 0 is equivalent to the regular failure of PEGs.

lpeglabel.Lc(p1, p2, l1, ..., ln)

Returns a pattern equivalent to a labeled ordered choice. If the matching of p1 gives one of the labels l1, ..., ln, then the matching of p2 is tried from the same position. Otherwise, the result of the matching of p1 is the pattern's result.

The labeled ordered choice lpeg.Lc(p1, p2, 0) is equivalent to the regular ordered choice p1 / p2.

Although PEG's ordered choice is associative, the labeled ordered choice is not. When using this function, the user should take care to build a left-associative labeled ordered choice pattern.

%{l1, ..., ln}

Syntax of relabel module. Equivalent to lpeg.T(l1, ..., ln).

p1 /{l1, ..., ln} p2

Syntax of relabel module. Equivalent to lpeg.Lc(p1, p2, l1, ..., ln).

The /{} operator is left-associative.

A grammar can use both choice operators (/ and /{}), but a single choice can not mix them. That is, the parser of relabel module will not recognize a pattern as p1 / p2 /{l1} p3.

relabel.setlabels (tlabel)

Allows to specicify a table with labels. They keys of tlabel must be integers between 0 and 31, and the associated values should be strings.

Some Examples

Throwing a label

The following example defines a grammar that matches a list of identifiers separated by commas. A label is thrown when there is an error matching an identifier or a comma:

local m = require'lpeglabel'

local g = m.P{
  "S",
  S = m.V"Id" * m.V"List",
  List = -m.P(1) + ("," + m.T(2)) * m.V"Id" * m.V"List",
  Id = m.R'az'^1 + m.T(1),
}

function mymatch (g, s)
  local r, e = g:match(s)
  if not r then
    if e == 1 then
      return "Error: expecting an identifier"
    elseif e == 2 then
      return "Error: expecting ','"
    else
      return "Error"
    end
  end
  return r
end
	
print(mymatch(g, "a,b"))
print(mymatch(g, "a b"))
print(mymatch(g, ", b"))

In this example we could think about writing rule List as follows:

List = m.P(("," + m.T(2)) * m.V"Id")^0
but this would give us an expression that when matching the end of input would result in a failure whose associated label would be 2.

In the previous example we could have also created a table with the error messages to improve the readbility of the PEG. Below we rewrite the grammar following this approach:

local m = require'lpeglabel'

local errUndef = 0
local errId = 1
local errComma = 2

local terror = {
  [errUndef] = "Error",
  [errId] = "Error: expecting an identifier",
  [errComma] = "Error: expecting ','",
}

local g = m.P{
  "S",
  S = m.V"Id" * m.V"List",
  List = -m.P(1) + ("," + m.T(errComma)) * m.V"Id" * m.V"List",
  Id = m.R'az'^1 + m.T(errId),
}

function mymatch (g, s)
  local r, e = g:match(s)
  if not r then
    return terror[e]
  end
  return r
end
	
print(mymatch(g, "a,b"))
print(mymatch(g, "a b"))
print(mymatch(g, ", b"))

Throwing a label using the relabel module

We can also rewrite the previous example using the relabel module as follows:

local re = require 'relabel' 

local g = re.compile[[
  S    <- Id List
  List <- !.  /  (',' / %{2}) Id List
  Id   <- [a-z]  /  %{1}	
]]

function mymatch (g, s)
  local r, e = g:match(s)
  if not r then
    if e == 1 then
      return "Error: expecting an identifier"
    elseif e == 2 then
      return "Error: expecting ','"
    else
      return "Error"
    end
  end
  return r
end
	
print(mymatch(g, "a,b"))
print(mymatch(g, "a b"))
print(mymatch(g, ", b"))

Another way to describe the previous example using the relabel module is by using a table with the description of the errors (terror) and another table that associates a name to a given label (tlabels):

local re = require 'relabel' 

local errUndef, errId, errComma = 0, 1, 2

local terror = {
  [errUndef] = "Error",
  [errId] = "Error: expecting an identifier",
  [errComma] = "Error: expecting ','",
}

local tlabels = { ["errUndef"] = errUndef,
                  ["errId"]    = errId, 
                  ["errComma"] = errComma }

re.setlabels(tlabels)

local g = re.compile[[
  S    <- Id List
  List <- !.  /  (',' / %{errComma}) Id List
  Id   <- [a-z]  /  %{errId}	
]]

function mymatch (g, s)
  local r, e = g:match(s)
  if not r then
    return terror[e]
  end
  return r
end
	
print(mymatch(g, "a,b"))
print(mymatch(g, "a b"))
print(mymatch(g, ", b"))

Throwing and catching a label

When a label is thrown, the grammar itself can handle this label by using the labeled ordered choice. Below we rewrite the example of the list of identifiers to show this feature:

local m = require'lpeglabel'

local errUndef, errId, errComma = 0, 1, 2

local terror = {
  [errUndef] = "Error",
  [errId] = "Error: expecting an identifier",
  [errComma] = "Error: expecting ','",
}

g = m.P{
  "S",
  S = m.Lc(m.Lc(m.V"Id" * m.V"List", m.V"ErrId", errId),
           m.V"ErrComma", errComma),
  List = -m.P(1)  +  m.V"Comma" * m.V"Id" * m.V"List",
  Id = m.R'az'^1  +  m.T(errId),
  Comma = ","  +  m.T(errComma),
  ErrId = m.Cc(errId) / terror,
  ErrComma = m.Cc(errComma) / terror
}

print(g:match("a,b"))
print(g:match("a b"))
print(g:match(",b"))

As was pointed out before, the labeled ordered choice is not associative, so we should impose a left-associative order when using function Lc.

Below we use the re module to throw and catch labels. As was pointed out before, the /{} operator is left-associative, so we do not need to manually impose a left-associative order as we did in the previous example that used Lc:

local re = require'relabel'

local terror = {} 

local function newError(l, msg) 
  table.insert(terror, { l = l, msg = msg } )
end

newError("errId", "Error: expecting an identifier")
newError("errComma", "Error: expecting ','")

local labelCode = {}
local labelMsg = {}
for k, v in ipairs(terror) do 
  labelCode[v.l] = k
  labelMsg[v.l] = v.msg
end

re.setlabels(labelCode)

local p = re.compile([[
  S        <- Id List  /{errId}  ErrId  /{errComma}  ErrComma
  List     <- !.  /  Comma Id List
  Id       <- [a-z]+  /  %{errId}
  Comma    <- ','  /  %{errComma}
  ErrId    <- '' -> errId
  ErrComma <- '' ->  errComma
]], labelMsg)

print(p:match("a,b"))
print(p:match("a b"))
print(p:match(",b"))

Tiny Language

As a more complex example, below we have the grammar for the Tiny language, as described in this paper. The example below can also show the line where the syntactic error probably happened.

local re = require 'relabel'

local terror = {}

local function newError(l, msg)
  table.insert(terror, { l = l, msg = msg} )
end

newError("errSemi", "Error: missing ';'")  
newError("errExpIf", "Error: expected expression after 'if'") 
newError("errThen", "Error: expected 'then' keyword") 
newError("errCmdSeq1", "Error: expected at least a command after 'then'") 
newError("errCmdSeq2", "Error: expected at least a command after 'else'") 
newError("errEnd", "Error: expected 'end' keyword") 
newError("errCmdSeqRep", "Error: expected at least a command after 'repeat'") 
newError("errUntil", "Error: expected 'until' keyword") 
newError("errExpRep", "Error: expected expression after 'until'") 
newError("errAssignOp", "Error: expected ':=' in assigment") 
newError("errExpAssign", "Error: expected expression after ':='") 
newError("errReadName", "Error: expected an identifier after 'read'") 
newError("errWriteExp", "Error: expected expression after 'write'") 
newError("errSimpExp", "Error: expected '(', ID, or number after '<' or '='")
newError("errTerm", "Error: expected '(', ID, or number after '+' or '-'")
newError("errFactor", "Error: expected '(', ID, or number after '*' or '/'")
newError("errExpFac", "Error: expected expression after '('")
newError("errClosePar", "Error: expected ')' after expression")

local line

local function incLine()
  line = line + 1
  return true
end

local function countLine(s, i)
  line = 1
  local p = re.compile([[
    S <- (%nl -> incLine  / .)*
  ]], { incLine = incLine}) 
  p:match(s:sub(1, i))
  return true
end

local labelCode = {}
for k, v in ipairs(terror) do 
  labelCode[v.l] = k
end

re.setlabels(labelCode)

local g = re.compile([[
  Tiny         <- CmdSeq  
  CmdSeq       <- (Cmd (SEMICOLON / ErrSemi)) (Cmd (SEMICOLON / ErrSemi))*
  Cmd          <- IfCmd / RepeatCmd / ReadCmd / WriteCmd  / AssignCmd 
  IfCmd        <- IF (Exp / ErrExpIf)  (THEN / ErrThen)  (CmdSeq / ErrCmdSeq1)  (ELSE (CmdSeq / ErrCmdSeq2)  / '') (END / ErrEnd)
  RepeatCmd    <- REPEAT  (CmdSeq / ErrCmdSeqRep)  (UNTIL / ErrUntil)  (Exp / ErrExpRep)
  AssignCmd    <- NAME  (ASSIGNMENT / ErrAssignOp)  (Exp / ErrExpAssign)
  ReadCmd      <- READ  (NAME / ErrReadName)
  WriteCmd     <- WRITE  (Exp / ErrWriteExp)
  Exp          <- SimpleExp  ((LESS / EQUAL) (SimpleExp / ErrSimpExp) / '')
  SimpleExp    <- Term  ((ADD / SUB)  (Term / ErrTerm))*
  Term         <- Factor  ((MUL / DIV)  (Factor / ErrFactor))*
  Factor       <- OPENPAR  (Exp / ErrExpFac)  (CLOSEPAR / ErrClosePar)  / NUMBER  / NAME
  ErrSemi      <- ErrCount %{errSemi}
  ErrExpIf     <- ErrCount %{errExpIf}
  ErrThen      <- ErrCount %{errThen}
  ErrCmdSeq1   <- ErrCount %{errCmdSeq1}
  ErrCmdSeq2   <- ErrCount %{errCmdSeq2}
  ErrEnd       <- ErrCount %{errEnd}
  ErrCmdSeqRep <- ErrCount %{errCmdSeqRep}
  ErrUntil     <- ErrCount %{errUntil}
  ErrExpRep    <- ErrCount %{errExpRep}
  ErrAssignOp  <- ErrCount %{errAssignOp}
  ErrExpAssign <- ErrCount %{errExpAssign}
  ErrReadName  <- ErrCount %{errReadName}
  ErrWriteExp  <- ErrCount %{errWriteExp}
  ErrSimpExp   <- ErrCount %{errSimpExp}
  ErrTerm      <- ErrCount %{errTerm}
  ErrFactor    <- ErrCount %{errFactor}
  ErrExpFac    <- ErrCount %{errExpFac}
  ErrClosePar  <- ErrCount %{errClosePar}
  ErrCount     <- '' => countLine 
  ADD          <- Sp '+'
  ASSIGNMENT   <- Sp ':='
  CLOSEPAR     <- Sp ')'
  DIV          <- Sp '/'
  IF           <- Sp 'if'
  ELSE         <- Sp 'else'
  END          <- Sp 'end'
  EQUAL        <- Sp '='
  LESS         <- Sp '<'
  MUL          <- Sp '*'
  NAME         <- Sp !RESERVED [a-z]+
  NUMBER       <- Sp [0-9]+
  OPENPAR      <- Sp '('
  READ         <- Sp 'read'
  REPEAT       <- Sp 'repeat'
  SEMICOLON    <- Sp ';'
  SUB          <- Sp '-'
  THEN         <- Sp 'then'
  UNTIL        <- Sp 'until'
  WRITE        <- Sp 'write'
  RESERVED     <- (IF / ELSE / END / READ / REPEAT / THEN / UNTIL / WRITE) ![a-z]+
  Sp           <- %s*	
]], { countLine = countLine })

Download

LPegLabel source code.

License

The MIT License (MIT)

Copyright (c) 2014-2015 Sérgio Medeiros

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.