From 24bf757183d8bd97f6f5b43d916814f3269c8347 Mon Sep 17 00:00:00 2001 From: Roberto Ierusalimschy Date: Wed, 17 Apr 2019 14:08:22 -0300 Subject: Implementation of UTF-8 ranges New constructor 'lpeg.utfR(from, to)' creates a pattern that matches UTF-8 byte sequences representing code points in the range [from, to]. --- lpeg.html | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'lpeg.html') diff --git a/lpeg.html b/lpeg.html index 8b9f59c..1295c4f 100644 --- a/lpeg.html +++ b/lpeg.html @@ -107,6 +107,9 @@ for creating patterns: Matches any character in string (Set) lpeg.R("xy") Matches any character between x and y (Range) +lpeg.utfR(cp1, cp2) + Matches an UTF-8 code point between cp1 and + cp2 patt^n Matches at least n repetitions of patt patt^-n @@ -329,6 +332,15 @@ are patterns that always fail.

+

lpeg.utfR (cp1, cp2)

+

+Returns a pattern that matches a valid UTF-8 byte sequence +representing a code point in the range [cp1, cp2]. +The range is limited by the natural Unicode limit of 0x10FFFF, +but may include surrogates. +

+ +

lpeg.V (v)

This operation creates a non-terminal (a variable) -- cgit v1.2.3-55-g6feb