From 814213b65fa4ab2b1a7216d06f68a6f3df89efcd Mon Sep 17 00:00:00 2001 From: Roberto Ierusalimschy Date: Mon, 27 May 2024 11:29:39 -0300 Subject: utf8.offset returns also final position of character 'utf8.offset' returns two values: the initial and the final position of the given character. --- manual/manual.of | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) (limited to 'manual') diff --git a/manual/manual.of b/manual/manual.of index f830b01c..359bd166 100644 --- a/manual/manual.of +++ b/manual/manual.of @@ -7958,21 +7958,27 @@ returns @fail plus the position of the first invalid byte. @LibEntry{utf8.offset (s, n [, i])| -Returns the position (in bytes) where the encoding of the -@id{n}-th character of @id{s} -(counting from position @id{i}) starts. +Returns the the position of the @id{n}-th character of @id{s} +(counting from byte position @id{i}) as two integers: +The index (in bytes) where its encoding starts and the +index (in bytes) where it ends. + +If the specified character is right after the end of @id{s}, +the function behaves as if there was a @Char{\0} there. +If the specified character is neither in the subject +nor right after its end, +the function returns @fail. + A negative @id{n} gets characters before position @id{i}. The default for @id{i} is 1 when @id{n} is non-negative and @T{#s + 1} otherwise, so that @T{utf8.offset(s, -n)} gets the offset of the @id{n}-th character from the end of the string. -If the specified character is neither in the subject -nor right after its end, -the function returns @fail. As a special case, -when @id{n} is 0 the function returns the start of the encoding -of the character that contains the @id{i}-th byte of @id{s}. +when @id{n} is 0 the function returns the start and end +of the encoding of the character that contains the +@id{i}-th byte of @id{s}. This function assumes that @id{s} is a valid UTF-8 string. -- cgit v1.2.3-55-g6feb