Post by Michael KayAgreed.
To confuse matters, though, I see that we still have the problematic
statement in A.2 "When tokenizing, the longest possible match that is
consistent with the EBNF is used."
In the CR period for XQuery 3.0, we changed that sentence from
"valid in the current context"
to
"consistent with the EBNF"
(See meeting 541.)
Post by Michael KayThis to my mind has always suggested the idea that the tokenization is
sensitive to the grammatical context. And in some cases it is; you don't
want to go looking for QNames or IntegerLiterals when you're in
DirElementContent, just because a QName or IntegerLiteral is longer than
a Char.
Right.
Post by Michael KayHowever, it could also be read as meaning that given "12 div3",
tokenizing "div3" as one token is not consistent with the EBNF (it
doesn't lead to a valid parse),
Yes, I believe that's how that sentence is supposed to be read. There are no
possible continuations of "12 div3" that conform to the EBNF, but there
*are* continuations of "12 div" that conform to the EBNF. So, when the
tokenizer is positioned just before the 'd', "div" is the longest possible
match (LPM) that is consistent with the EBNF, so the next token is "div".
Post by Michael Kayso it should be tokenized as two tokens.
Well, that's less clear, but I think it's one valid interpretation.
Post by Michael KayI don't think that has ever been the intent, and I guess section A.2.2 on
delimiting and non-delimiting terminals was added to eliminate this
interpretation.
I don't think there's a problem with saying it's tokenized as two tokens.
Just because a text can be tokenized doesn't mean it's free of syntax
errors. And section A.2.2 gives just one of the many requirements that a
sequence of tokens must satisfy in order to be error-free. (Specifically,
"div" and "3" are adjacent non-delimiting terminal symbols, and so must be
separated by Whitespace and/or Comments.)
So, in that view, A.2.2 wasn't added to modify the interpretation of the LPM
rule, it was added to flag some of the cases that the LPM rule "lets through".
-Michael
_______________________________________________
***@x-query.com
http://x-query.com/mailman/listinfo/talk