Discussion:
[xquery-talk] Function for determining one XPath as subset of another
Adam Retter
2016-01-26 16:26:17 UTC
Permalink
Given two simple XPaths, say:

1. //w

2. /x/y/z/w[@a = 'v']

As a human I can very easily tell without evaluating the expressions
that (2) will return a subset (or the same set) of the results that
(1) would return *should* they both be evaluated.

My goal here is given any two simple arbitrary XPaths expressed as
strings, and without evaluating them against a context, to determine
whether one would return a subset of the results of the other.

I wondered if there might be an algorithm or library that someone
already had or has written which might be able to give me the answer?

I realise that I can only probably cover a subset of XPath itself, but
it is only the path steps with predicates which I am interested in.

Ideally I am looking for something in Java.
--
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
_______________________________________________
***@x-query.com
http://x-query.com/mailman/listinfo/talk
W.S. Hager
2016-01-26 16:40:52 UTC
Permalink
Hi Adam,

Perhaps it helps to start with rewriting the xpath expressions as pure
lambda expressions. Maybe that way you could apply lambda calculus?

Cheers,
Wouter
Post by Adam Retter
1. //w
As a human I can very easily tell without evaluating the expressions
that (2) will return a subset (or the same set) of the results that
(1) would return *should* they both be evaluated.
My goal here is given any two simple arbitrary XPaths expressed as
strings, and without evaluating them against a context, to determine
whether one would return a subset of the results of the other.
I wondered if there might be an algorithm or library that someone
already had or has written which might be able to give me the answer?
I realise that I can only probably cover a subset of XPath itself, but
it is only the path steps with predicates which I am interested in.
Ideally I am looking for something in Java.
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
Adam Retter
2016-01-26 16:41:50 UTC
Permalink
Any chance you could offer me an example? ;-)
Post by W.S. Hager
Hi Adam,
Perhaps it helps to start with rewriting the xpath expressions as pure
lambda expressions. Maybe that way you could apply lambda calculus?
Cheers,
Wouter
Post by Adam Retter
1. //w
As a human I can very easily tell without evaluating the expressions
that (2) will return a subset (or the same set) of the results that
(1) would return *should* they both be evaluated.
My goal here is given any two simple arbitrary XPaths expressed as
strings, and without evaluating them against a context, to determine
whether one would return a subset of the results of the other.
I wondered if there might be an algorithm or library that someone
already had or has written which might be able to give me the answer?
I realise that I can only probably cover a subset of XPath itself, but
it is only the path steps with predicates which I am interested in.
Ideally I am looking for something in Java.
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
_______________________________________________
***@x-query.com
http://x-query.com/mailman/listinfo/talk
W.S. Hager
2016-01-26 17:06:44 UTC
Permalink
Not really, no. But my guess is that the path is not relevant, as its steps
simply contain variable names. The proof you want to have is a formal one,
and I think it doesn't have to do with the path expression, but rather the
formalism it implements.
Post by Adam Retter
Any chance you could offer me an example? ;-)
Post by W.S. Hager
Hi Adam,
Perhaps it helps to start with rewriting the xpath expressions as pure
lambda expressions. Maybe that way you could apply lambda calculus?
Cheers,
Wouter
Post by Adam Retter
1. //w
As a human I can very easily tell without evaluating the expressions
that (2) will return a subset (or the same set) of the results that
(1) would return *should* they both be evaluated.
My goal here is given any two simple arbitrary XPaths expressed as
strings, and without evaluating them against a context, to determine
whether one would return a subset of the results of the other.
I wondered if there might be an algorithm or library that someone
already had or has written which might be able to give me the answer?
I realise that I can only probably cover a subset of XPath itself, but
it is only the path steps with predicates which I am interested in.
Ideally I am looking for something in Java.
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
W.S. Hager
2016-01-26 21:12:07 UTC
Permalink
Hi Adam,

I'm looking at the formal specification of xpath/xquery:

http://www.w3.org/TR/xquery-semantics

It would really help to start with a function that implements the actual
selection in steps. Do you know such a function?

Thanks,

Wouter
Post by W.S. Hager
Not really, no. But my guess is that the path is not relevant, as its
steps simply contain variable names. The proof you want to have is a formal
one, and I think it doesn't have to do with the path expression, but rather
the formalism it implements.
Post by Adam Retter
Any chance you could offer me an example? ;-)
Post by W.S. Hager
Hi Adam,
Perhaps it helps to start with rewriting the xpath expressions as pure
lambda expressions. Maybe that way you could apply lambda calculus?
Cheers,
Wouter
Post by Adam Retter
1. //w
As a human I can very easily tell without evaluating the expressions
that (2) will return a subset (or the same set) of the results that
(1) would return *should* they both be evaluated.
My goal here is given any two simple arbitrary XPaths expressed as
strings, and without evaluating them against a context, to determine
whether one would return a subset of the results of the other.
I wondered if there might be an algorithm or library that someone
already had or has written which might be able to give me the answer?
I realise that I can only probably cover a subset of XPath itself, but
it is only the path steps with predicates which I am interested in.
Ideally I am looking for something in Java.
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
W.S. Hager
2016-01-26 21:15:18 UTC
Permalink
Here's the entry on steps:

http://www.w3.org/TR/xquery-semantics/#id-axis-steps

I missed it before. To be continued I guess. I'd love the idea of a sound
formal proof.
Post by W.S. Hager
Hi Adam,
http://www.w3.org/TR/xquery-semantics
It would really help to start with a function that implements the actual
selection in steps. Do you know such a function?
Thanks,
Wouter
Post by W.S. Hager
Not really, no. But my guess is that the path is not relevant, as its
steps simply contain variable names. The proof you want to have is a formal
one, and I think it doesn't have to do with the path expression, but rather
the formalism it implements.
Post by Adam Retter
Any chance you could offer me an example? ;-)
Post by W.S. Hager
Hi Adam,
Perhaps it helps to start with rewriting the xpath expressions as pure
lambda expressions. Maybe that way you could apply lambda calculus?
Cheers,
Wouter
Post by Adam Retter
1. //w
As a human I can very easily tell without evaluating the expressions
that (2) will return a subset (or the same set) of the results that
(1) would return *should* they both be evaluated.
My goal here is given any two simple arbitrary XPaths expressed as
strings, and without evaluating them against a context, to determine
whether one would return a subset of the results of the other.
I wondered if there might be an algorithm or library that someone
already had or has written which might be able to give me the answer?
I realise that I can only probably cover a subset of XPath itself, but
it is only the path steps with predicates which I am interested in.
Ideally I am looking for something in Java.
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
Pavel Velikhov
2016-01-26 22:06:35 UTC
Permalink
If you want to do a formal proof, you might want to use simplified notions of XML documents and path expressions. And do you need to take the schema into account?

Here is a good paper that defines labeled ordered tree objects, schemas and simple queries for proving stuff. (And they use all these definitions to do proofs):
http://db.ucsd.edu/publications/viewDTD.pdf
Post by W.S. Hager
http://www.w3.org/TR/xquery-semantics/#id-axis-steps
I missed it before. To be continued I guess. I'd love the idea of a sound formal proof.
Post by W.S. Hager
Hi Adam,
http://www.w3.org/TR/xquery-semantics
It would really help to start with a function that implements the actual selection in steps. Do you know such a function?
Thanks,
Wouter
Not really, no. But my guess is that the path is not relevant, as its steps simply contain variable names. The proof you want to have is a formal one, and I think it doesn't have to do with the path expression, but rather the formalism it implements.
Post by Adam Retter
Any chance you could offer me an example? ;-)
Post by W.S. Hager
Hi Adam,
Perhaps it helps to start with rewriting the xpath expressions as pure
lambda expressions. Maybe that way you could apply lambda calculus?
Cheers,
Wouter
Post by Adam Retter
1. //w
As a human I can very easily tell without evaluating the expressions
that (2) will return a subset (or the same set) of the results that
(1) would return *should* they both be evaluated.
My goal here is given any two simple arbitrary XPaths expressed as
strings, and without evaluating them against a context, to determine
whether one would return a subset of the results of the other.
I wondered if there might be an algorithm or library that someone
already had or has written which might be able to give me the answer?
I realise that I can only probably cover a subset of XPath itself, but
it is only the path steps with predicates which I am interested in.
Ideally I am looking for something in Java.
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Michael Kay
2016-01-27 08:27:43 UTC
Permalink
Post by Pavel Velikhov
If you want to do a formal proof, you might want to use simplified notions of XML documents and path expressions.
It seems to be a long-standing tradition that computer scientists, when asked to prove a difficult conjecture C, respond by giving a proof for a simplified conjecture C'. While this might lead to progress in the long run, and enables them to get papers published in the academic literature, it is totally useless to practical engineeers who want to know whether they can safely rely on C.

Michael Kay
Saxonica




_______________________________________________
***@x-query.com
http://x-query.com/mailman/listinfo/talk
W.S. Hager
2016-01-27 09:12:36 UTC
Permalink
I think simplifying notions of path expressions aren't necessary. IMO the
challenge here is to come to discreet formal steps. I'd start with applying
all normalisations in http://www.w3.org/TR/xquery-semantics/#id-axis-steps,
from which a single evaluable function could be derived. The function
should be rewritable as yielding the same set (for /x/w and //w). Next the
relation to the subset should become obvious.
Post by Pavel Velikhov
Post by Pavel Velikhov
If you want to do a formal proof, you might want to use simplified
notions of XML documents and path expressions.
It seems to be a long-standing tradition that computer scientists, when
asked to prove a difficult conjecture C, respond by giving a proof for a
simplified conjecture C'. While this might lead to progress in the long
run, and enables them to get papers published in the academic literature,
it is totally useless to practical engineeers who want to know whether they
can safely rely on C.
Michael Kay
Saxonica
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
Pavel Velikhov
2016-01-27 09:25:31 UTC
Permalink
Post by Michael Kay
Post by Pavel Velikhov
If you want to do a formal proof, you might want to use simplified notions of XML documents and path expressions.
It seems to be a long-standing tradition that computer scientists, when asked to prove a difficult conjecture C, respond by giving a proof for a simplified conjecture C'. While this might lead to progress in the long run, and enables them to get papers published in the academic literature, it is totally useless to practical engineeers who want to know whether they can safely rely on C.
Its a common practice for everybody, who needs to come up with formal proofs. You start with the most simplified definitions possible, that capture the essence of the problem.
Then you get the skeleton of the proof that is hopefully very simple. Then you can add details back, hoping that the proof remains simple and tractable.

So imagine starting the proof while considering all the possible variations of path expressions, all the Infoset stuff, all XML Schema details. I think its hopeless.
Post by Michael Kay
Michael Kay
Saxonica
_______________________________________________
***@x-query.com
http://x-query.com/mailman/listinfo/talk
Christian Grün
2016-01-27 09:37:21 UTC
Permalink
Post by Pavel Velikhov
Its a common practice for everybody, who needs to come up with formal proofs. You start with the most simplified definitions possible, that capture the essence of the problem.
Then you get the skeleton of the proof that is hopefully very simple. Then you can add details back, hoping that the proof remains simple and tractable.
So imagine starting the proof while considering all the possible variations of path expressions, all the Infoset stuff, all XML Schema details. I think its hopeless.
I’m completely in line with Michael’s observations. It would obviously
be nice to have proofs for the full rule sets of the discussed
languages; but as experience shows, no one will do it (the rare
exception might prove the rule), so we are stuck with the work that is
of limited practical use.

_______________________________________________
tal
Pavel Velikhov
2016-01-27 09:51:46 UTC
Permalink
Post by Christian Grün
Post by Pavel Velikhov
Its a common practice for everybody, who needs to come up with formal proofs. You start with the most simplified definitions possible, that capture the essence of the problem.
Then you get the skeleton of the proof that is hopefully very simple. Then you can add details back, hoping that the proof remains simple and tractable.
So imagine starting the proof while considering all the possible variations of path expressions, all the Infoset stuff, all XML Schema details. I think its hopeless.
I’m completely in line with Michael’s observations. It would obviously
be nice to have proofs for the full rule sets of the discussed
languages; but as experience shows, no one will do it (the rare
exception might prove the rule), so we are stuck with the work that is
of limited practical use.
I can’t agree with you. There are a lot of results that automatically hold for the full specification, even though they are
proved on a clean and easy-to-use subset: undecidability and np-completeness or np-hardness for instance.

For the full specifications its sometimes hard to grasp the semantics, so proving anything serious is impossible.
Example: try proving that SQL-2003 queries are equivalent to some superset of relational algebra.


_______________________________________________
***@x-query.com
http://x
W.S. Hager
2016-01-27 09:54:37 UTC
Permalink
Can't we formally proof something as obvious Adam's case?
Post by Pavel Velikhov
Post by Pavel Velikhov
Its a common practice for everybody, who needs to come up with formal
proofs. You start with the most simplified definitions possible, that
capture the essence of the problem.
Post by Pavel Velikhov
Then you get the skeleton of the proof that is hopefully very simple.
Then you can add details back, hoping that the proof remains simple and
tractable.
Post by Pavel Velikhov
So imagine starting the proof while considering all the possible
variations of path expressions, all the Infoset stuff, all XML Schema
details. I think its hopeless.
I’m completely in line with Michael’s observations. It would obviously
be nice to have proofs for the full rule sets of the discussed
languages; but as experience shows, no one will do it (the rare
exception might prove the rule), so we are stuck with the work that is
of limited practical use.
I can’t agree with you. There are a lot of results that automatically hold
for the full specification, even though they are
proved on a clean and easy-to-use subset: undecidability and
np-completeness or np-hardness for instance.
For the full specifications its sometimes hard to grasp the semantics, so
proving anything serious is impossible.
Example: try proving that SQL-2003 queries are equivalent to some superset
of relational algebra.
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
Pavel Velikhov
2016-01-27 10:04:06 UTC
Permalink
Post by W.S. Hager
Can't we formally proof something as obvious Adam's case?
In Adam’s case we want to test whether path expression p1 subsumes path expression p2.
If we don’t put any conditions on p1 and p2, the problem is undecidable: p1 and p2 may include
function calls, so the expressive power of p1 and p2 are that of a Turing Machine.
_______________________________________________
***@x-q
W.S. Hager
2016-01-27 10:09:01 UTC
Permalink
Isn't the constraint in this case the test: is p2 a subset of p1?
Post by W.S. Hager
Can't we formally proof something as obvious Adam's case?
In Adam’s case we want to test whether path expression p1 subsumes path
expression p2.
p1 and p2 may include
function calls, so the expressive power of p1 and p2 are that of a Turing Machine.
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
W.S. Hager
2016-01-27 10:10:10 UTC
Permalink
Or simplified: is the set selected by p1 equal to the set selected by p2?
Post by W.S. Hager
Isn't the constraint in this case the test: is p2 a subset of p1?
Post by W.S. Hager
Can't we formally proof something as obvious Adam's case?
In Adam’s case we want to test whether path expression p1 subsumes path
expression p2.
p1 and p2 may include
function calls, so the expressive power of p1 and p2 are that of a Turing Machine.
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
Pavel Velikhov
2016-01-27 10:11:36 UTC
Permalink
Post by W.S. Hager
Or simplified: is the set selected by p1 equal to the set selected by p2?
If we allow p1 and p2 to be arbitrary XQuery path expressions, then its undecidable.
I can reduce the halting problem of Turing machines to this test.
Post by W.S. Hager
Isn't the constraint in this case the test: is p2 a subset of p1?
Post by W.S. Hager
Can't we formally proof something as obvious Adam's case?
In Adam’s case we want to test whether path expression p1 subsumes path expression p2.
If we don’t put any conditions on p1 and p2, the problem is undecidable: p1 and p2 may include
function calls, so the expressive power of p1 and p2 are that of a Turing Machine.
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl <http://lagua.nl/>
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl <http://lagua.nl/>
W.S. Hager
2016-01-27 10:17:31 UTC
Permalink
Indeed, the expressions should terminate.
Post by W.S. Hager
Or simplified: is the set selected by p1 equal to the set selected by p2?
If we allow p1 and p2 to be arbitrary XQuery path expressions, then its undecidable.
I can reduce the halting problem of Turing machines to this test.
Post by W.S. Hager
Isn't the constraint in this case the test: is p2 a subset of p1?
Post by W.S. Hager
Can't we formally proof something as obvious Adam's case?
In Adam’s case we want to test whether path expression p1 subsumes path
expression p2.
p1 and p2 may include
function calls, so the expressive power of p1 and p2 are that of a Turing Machine.
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
Pavel Velikhov
2016-01-27 10:24:12 UTC
Permalink
Post by W.S. Hager
Indeed, the expressions should terminate.
Since you can include a function call in your path expression, and you can include your own custom function, there is
no guarantee that the expression will terminate.

I.e. you need to take a subset of possible path expression already, in general case the problem is undecidable.
Post by W.S. Hager
Post by W.S. Hager
Or simplified: is the set selected by p1 equal to the set selected by p2?
If we allow p1 and p2 to be arbitrary XQuery path expressions, then its undecidable.
I can reduce the halting problem of Turing machines to this test.
Post by W.S. Hager
Isn't the constraint in this case the test: is p2 a subset of p1?
Post by W.S. Hager
Can't we formally proof something as obvious Adam's case?
In Adam’s case we want to test whether path expression p1 subsumes path expression p2.
If we don’t put any conditions on p1 and p2, the problem is undecidable: p1 and p2 may include
function calls, so the expressive power of p1 and p2 are that of a Turing Machine.
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl <http://lagua.nl/>
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl <http://lagua.nl/>
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl <http://lagua.nl/>
W.S. Hager
2016-01-27 10:31:38 UTC
Permalink
There's the notion of co-inductive types. I think in Haskell you could
proof the case, so why not in xquery?
Post by W.S. Hager
Indeed, the expressions should terminate.
Since you can include a function call in your path expression, and you can
include your own custom function, there is
no guarantee that the expression will terminate.
I.e. you need to take a subset of possible path expression already, in
general case the problem is undecidable.
Post by W.S. Hager
Or simplified: is the set selected by p1 equal to the set selected by p2?
If we allow p1 and p2 to be arbitrary XQuery path expressions, then its undecidable.
I can reduce the halting problem of Turing machines to this test.
Post by W.S. Hager
Isn't the constraint in this case the test: is p2 a subset of p1?
Post by W.S. Hager
Can't we formally proof something as obvious Adam's case?
In Adam’s case we want to test whether path expression p1 subsumes path
expression p2.
If we don’t put any conditions on p1 and p2, the problem is
undecidable: p1 and p2 may include
function calls, so the expressive power of p1 and p2 are that of a Turing Machine.
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
Pavel Velikhov
2016-01-27 10:39:37 UTC
Permalink
A user-defined function in any reasonable language (including Haskell and XQuery) is Turing-complete.

So you cannot prove that:
- two functions are equivalent
- the functions terminate
- basically any non-trivial fact

Unless you make some strong assumptions about these functions.
There's the notion of co-inductive types. I think in Haskell you could proof the case, so why not in xquery?
Post by W.S. Hager
Indeed, the expressions should terminate.
Since you can include a function call in your path expression, and you can include your own custom function, there is
no guarantee that the expression will terminate.
I.e. you need to take a subset of possible path expression already, in general case the problem is undecidable.
Post by W.S. Hager
Post by W.S. Hager
Or simplified: is the set selected by p1 equal to the set selected by p2?
If we allow p1 and p2 to be arbitrary XQuery path expressions, then its undecidable.
I can reduce the halting problem of Turing machines to this test.
Post by W.S. Hager
Isn't the constraint in this case the test: is p2 a subset of p1?
Post by W.S. Hager
Can't we formally proof something as obvious Adam's case?
In Adam’s case we want to test whether path expression p1 subsumes path expression p2.
If we don’t put any conditions on p1 and p2, the problem is undecidable: p1 and p2 may include
function calls, so the expressive power of p1 and p2 are that of a Turing Machine.
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl <http://lagua.nl/>
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl <http://lagua.nl/>
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl <http://lagua.nl/>
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl <http://lagua.nl/>
Michael Kay
2016-01-27 10:50:39 UTC
Permalink
To take an example, if someone proves that a particular equivalence holds on a simplified data model with no attribute, namespace, comment, or processing instruction nodes, then that is of absolutely no use to someone writing an optimizer for a product in which such nodes can be encountered.

Michael Kay
Saxonica
Post by Pavel Velikhov
Post by Christian Grün
Post by Pavel Velikhov
Its a common practice for everybody, who needs to come up with formal proofs. You start with the most simplified definitions possible, that capture the essence of the problem.
Then you get the skeleton of the proof that is hopefully very simple. Then you can add details back, hoping that the proof remains simple and tractable.
So imagine starting the proof while considering all the possible variations of path expressions, all the Infoset stuff, all XML Schema details. I think its hopeless.
I’m completely in line with Michael’s observations. It would obviously
be nice to have proofs for the full rule sets of the discussed
languages; but as experience shows, no one will do it (the rare
exception might prove the rule), so we are stuck with the work that is
of limited practical use.
I can’t agree with you. There are a lot of results that automatically hold for the full specification, even though they are
proved on a clean and easy-to-use subset: undecidability and np-completeness or np-hardness for instance.
For the full specifications its sometimes hard to grasp the semantics, so proving anything serious is impossible.
Example: try proving that SQL-2003 queries are equivalent to some superset of relational algebra.
_______________________________________________
***@x-query.com
http://x-query.com/mailman/listinfo/t
Pavel Velikhov
2016-01-27 11:03:53 UTC
Permalink
Post by Michael Kay
To take an example, if someone proves that a particular equivalence holds on a simplified data model with no attribute, namespace, comment, or processing instruction nodes, then that is of absolutely no use to someone writing an optimizer for a product in which such nodes can be encountered.
Michael Kay
Saxonica
Michael, even in this case, with these simplifying assumptions, there is a very good chance that the equivalence will hold on the full data model. You can also do a 1-to-1 mapping of the full data model to the simplifed one. And an optimiser developer can read the proof procedure and extend it pretty easily with all the gory details.

But if somebody proves that the equivalence is undecidable - it definitely will hold for the full data model. So you can forget about a general method in the optimiser and immediately go for special cases.

So I don’t agree that all proofs are useless. And also we see that a lot of functional languages cleaned up their semantics precisely because they wanted to prove things.
Post by Michael Kay
Post by Pavel Velikhov
Post by Christian Grün
Post by Pavel Velikhov
Its a common practice for everybody, who needs to come up with formal proofs. You start with the most simplified definitions possible, that capture the essence of the problem.
Then you get the skeleton of the proof that is hopefully very simple. Then you can add details back, hoping that the proof remains simple and tractable.
So imagine starting the proof while considering all the possible variations of path expressions, all the Infoset stuff, all XML Schema details. I think its hopeless.
I’m completely in line with Michael’s observations. It would obviously
be nice to have proofs for the full rule sets of the discussed
languages; but as experience shows, no one will do it (the rare
exception might prove the rule), so we are stuck with the work that is
of limited practical use.
I can’t agree with you. There are a lot of results that automatically hold for the full specification, even though they are
proved on a clean and easy-to-use subset: undecidability and np-completeness or np-hardness for instance.
For the full specifications its sometimes hard to grasp the semantics, so proving anything serious is impossible.
Example: try proving that SQL-2003 queries are equivalent to some superset of relational algebra.
_______________________________________________
***@x-query.com
http://x-query.c
Christian Grün
2016-01-27 11:08:04 UTC
Permalink
Post by Pavel Velikhov
So I don’t agree that all proofs are useless. And also we see that a lot of functional languages cleaned up their semantics precisely because they wanted to prove things.
…absolutely. If theory is applied at the very beginning, it may
prevent you from doing things that no one wants to prove later on.
Post by Pavel Velikhov
Post by Pavel Velikhov
Post by Christian Grün
Post by Pavel Velikhov
Its a common practice for everybody, who needs to come up with formal proofs. You start with the most simplified definitions possible, that capture the essence of the problem.
Then you get the skeleton of the proof that is hopefully very simple. Then you can add details back, hoping that the proof remains simple and tractable.
So imagine starting the proof while considering all the possible variations of path expressions, all the Infoset stuff, all XML Schema details. I think its hopeless.
I’m completely in line with Michael’s observations. It would obviously
be nice to have proofs for the full rule sets of the discussed
languages; but as experience shows, no one will do it (the rare
exception might prove the rule), so we are stuck with the work that is
of limited practical use.
I can’t agree with you. There are a lot of results that automatically hold for the full specification, even though they are
proved on a clean and easy-to-use subset: undecidability and np-completeness or np-hardness for instance.
For the full specifications its sometimes hard to grasp the semantics, so proving anything serious is impossible.
Example: try proving that SQL-2003 queries are equivalent to some superset of relational algebra.
_______________________________________________
***@x-query.com
h
daniela florescu
2016-01-27 16:10:28 UTC
Permalink
Post by Michael Kay
It seems to be a long-standing tradition that computer scientists, when asked to prove a difficult conjecture C, respond by giving a proof for a simplified conjecture C'. While this might lead to progress in the long run, and enables them to get papers published in the academic literature, it is totally useless to practical engineeers who want to know whether they can safely rely on C.
Michael,


what you say is nice and true.

However given that:
1. path expressions point (syntactically hence esemantically) into XQuery’s expressions
2. XQuery expression language is Turing complete
3. Subsumption for a Turing complete language is undecidable.

Well, I can hardly see a way to decide this problem other then by introducing SOME restrictions
of some sort… but of course some restrictions that would not nullify the original problem all together
and make the solution useless.

Best,
Dana
_______________________________________________
***@x-query.com
W.S. Hager
2016-01-27 16:41:55 UTC
Permalink
Well, so, to continue, let's assume that there are no user-defined
functions, and in fact the only thing we want to proof is select+filter,
where a filter is limited to the default operators. From that is it follows
that

-path1:
select-child-nodes-by-name(select-child-nodes-by-name($context,'x'),'w')
-path2: select-descendant-nodes-by-name($context,'w')

And we need to proof that those functions are somehow equivalent. Can it be
done?
Post by Michael Kay
Post by Michael Kay
It seems to be a long-standing tradition that computer scientists, when
asked to prove a difficult conjecture C, respond by giving a proof for a
simplified conjecture C'. While this might lead to progress in the long
run, and enables them to get papers published in the academic literature,
it is totally useless to practical engineeers who want to know whether they
can safely rely on C.
Michael,
what you say is nice and true.
1. path expressions point (syntactically hence esemantically) into
XQuery’s expressions
2. XQuery expression language is Turing complete
3. Subsumption for a Turing complete language is undecidable.
Well, I can hardly see a way to decide this problem other then by
introducing SOME restrictions
of some sort
 but of course some restrictions that would not nullify the
original problem all together
and make the solution useless.
Best,
Dana
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
Pavel Velikhov
2016-01-27 16:45:10 UTC
Permalink
Well, so, to continue, let's assume that there are no user-defined functions, and in fact the only thing we want to proof is select+filter, where a filter is limited to the default operators. From that is it follows that
-path1: select-child-nodes-by-name(select-child-nodes-by-name($context,'x'),'w')
-path2: select-descendant-nodes-by-name($context,'w')
And we need to proof that those functions are somehow equivalent. Can it be done?
Its easy to prove that they are not equivalent. You can have nodes with ‘w’ label at multiple levels, including children on $context.
_______________________________________________
***@x-query.com
http://x-query.com/mailman/
Christian Grün
2016-01-27 16:46:07 UTC
Permalink
Post by W.S. Hager
Well, so, to continue, let's assume that there are no user-defined
functions, and in fact the only thing we want to proof is select+filter,
where a filter is limited to the default operators. From that is it follows
that
select-child-nodes-by-name(select-child-nodes-by-name($context,'x'),'w')
-path2: select-descendant-nodes-by-name($context,'w')
Just to complete this: The predicate must not be numeric (//w[1] is
not equivalent to /descendant::w[1]).
Post by W.S. Hager
Post by Pavel Velikhov
Post by Michael Kay
It seems to be a long-standing tradition that computer scientists, when
asked to prove a difficult conjecture C, respond by giving a proof for a
simplified conjecture C'. While this might lead to progress in the long run,
and enables them to get papers published in the academic literature, it is
totally useless to practical engineeers who want to know whether they can
safely rely on C.
Michael,
what you say is nice and true.
1. path expressions point (syntactically hence esemantically) into XQuery’s expressions
2. XQuery expression language is Turing complete
3. Subsumption for a Turing complete language is undecidable.
Well, I can hardly see a way to decide this problem other then by
introducing SOME restrictions
of some sort… but of course some restrictions that would not nullify the
original problem all together
and make the solution useless.
Best,
Dana
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
***@x-query.com
h
W.S. Hager
2016-01-27 18:42:59 UTC
Permalink
Finally, can it be proved that /w[@a=b] is a subset of /w, taking into
account that the filter can only contain standard operators (eq, gt, lt,
etc as defined in the op namespace)?
Post by W.S. Hager
Post by W.S. Hager
Well, so, to continue, let's assume that there are no user-defined
functions, and in fact the only thing we want to proof is select+filter,
where a filter is limited to the default operators. From that is it
follows
Post by W.S. Hager
that
select-child-nodes-by-name(select-child-nodes-by-name($context,'x'),'w')
-path2: select-descendant-nodes-by-name($context,'w')
Just to complete this: The predicate must not be numeric (//w[1] is
not equivalent to /descendant::w[1]).
het
Post by W.S. Hager
Post by Pavel Velikhov
Post by Michael Kay
It seems to be a long-standing tradition that computer scientists,
when
Post by W.S. Hager
Post by Pavel Velikhov
Post by Michael Kay
asked to prove a difficult conjecture C, respond by giving a proof
for a
Post by W.S. Hager
Post by Pavel Velikhov
Post by Michael Kay
simplified conjecture C'. While this might lead to progress in the
long run,
Post by W.S. Hager
Post by Pavel Velikhov
Post by Michael Kay
and enables them to get papers published in the academic literature,
it is
Post by W.S. Hager
Post by Pavel Velikhov
Post by Michael Kay
totally useless to practical engineeers who want to know whether they
can
Post by W.S. Hager
Post by Pavel Velikhov
Post by Michael Kay
safely rely on C.
Michael,
what you say is nice and true.
1. path expressions point (syntactically hence esemantically) into
XQuery’s expressions
2. XQuery expression language is Turing complete
3. Subsumption for a Turing complete language is undecidable.
Well, I can hardly see a way to decide this problem other then by
introducing SOME restrictions
of some sort
 but of course some restrictions that would not nullify the
original problem all together
and make the solution useless.
Best,
Dana
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
_______________________________________________
http://x-query.com/mailman/listinfo/talk
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
Pavel Velikhov
2016-01-28 09:55:40 UTC
Permalink
Okay, now we are close! However now formal do you want the proof to be? I can give this informal proof:

The first path expression selects some subset of the children of the node that it applies, where the label is 'w'. The second one selects all descendants with label 'w', hence it's result contains all nodes of the first path expression.

But if you want to go formal, you need to use the semantics of XQuery path expressions, as well as the formal specification of the data model. There used to be a formal document on XQuery data model.
Post by W.S. Hager
Post by Christian Grün
Post by W.S. Hager
Well, so, to continue, let's assume that there are no user-defined
functions, and in fact the only thing we want to proof is select+filter,
where a filter is limited to the default operators. From that is it follows
that
select-child-nodes-by-name(select-child-nodes-by-name($context,'x'),'w')
-path2: select-descendant-nodes-by-name($context,'w')
Just to complete this: The predicate must not be numeric (//w[1] is
not equivalent to /descendant::w[1]).
Post by W.S. Hager
Post by Pavel Velikhov
Post by Michael Kay
It seems to be a long-standing tradition that computer scientists, when
asked to prove a difficult conjecture C, respond by giving a proof for a
simplified conjecture C'. While this might lead to progress in the long run,
and enables them to get papers published in the academic literature, it is
totally useless to practical engineeers who want to know whether they can
safely rely on C.
Michael,
what you say is nice and true.
1. path expressions point (syntactically hence esemantically) into
XQuery’s expressions
2. XQuery expression language is Turing complete
3. Subsumption for a Turing complete language is undecidable.
Well, I can hardly see a way to decide this problem other then by
introducing SOME restrictions
of some sort
 but of course some restrictions that would not nullify the
original problem all together
and make the solution useless.
Best,
Dana
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
_______________________________________________
http://x-query.com/mailman/listinfo/talk
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
Adam Retter
2016-01-28 12:18:09 UTC
Permalink
Okay thanks to everyone for the replies.

I think the scope of what I was looking for got thrown out in all the
discussion, or perhaps I wasn't clear enough early in my initial post.

I am not interested in XQuery, and I am only interested in a small
subset of XPath. I am concerned with comparing two path expressions,
these path expressions may have predicates, what I didn't make clear
is that those predicates may not contain functions, only the
comparison operators eq, ne, gt, ge, lt, le, and =, !=, >, >=, <, <=.


As Christian observed, [1] is not the same as [@a eq 'b']. In my
original post, I posed these two path expressions:

1. //w

2. /x/y/z/w[@a = 'v']

In this instance I can actually ignore the predicate, as I can see
that /x/y/z/w would already produce the subset of //w and the
predicate only serves to restrict that further.

I am not looking for a formal proof in any sense. I am looking for
something practical that I can use in code.
account that the filter can only contain standard operators (eq, gt, lt, etc
as defined in the op namespace)?
Okay, now we are close! However now formal do you want the proof to be? I
The first path expression selects some subset of the children of the node
that it applies, where the label is 'w'. The second one selects all
descendants with label 'w', hence it's result contains all nodes of the
first path expression.
But if you want to go formal, you need to use the semantics of XQuery path
expressions, as well as the formal specification of the data model. There
used to be a formal document on XQuery data model.
Post by Christian Grün
Post by W.S. Hager
Well, so, to continue, let's assume that there are no user-defined
functions, and in fact the only thing we want to proof is select+filter,
where a filter is limited to the default operators. From that is it follows
that
select-child-nodes-by-name(select-child-nodes-by-name($context,'x'),'w')
-path2: select-descendant-nodes-by-name($context,'w')
Just to complete this: The predicate must not be numeric (//w[1] is
not equivalent to /descendant::w[1]).
Post by W.S. Hager
Post by Pavel Velikhov
Post by Michael Kay
It seems to be a long-standing tradition that computer scientists, when
asked to prove a difficult conjecture C, respond by giving a proof for a
simplified conjecture C'. While this might lead to progress in the long run,
and enables them to get papers published in the academic literature, it is
totally useless to practical engineeers who want to know whether they can
safely rely on C.
Michael,
what you say is nice and true.
1. path expressions point (syntactically hence esemantically) into
XQuery’s expressions
2. XQuery expression language is Turing complete
3. Subsumption for a Turing complete language is undecidable.
Well, I can hardly see a way to decide this problem other then by
introducing SOME restrictions
of some sort… but of course some restrictions that would not nullify the
original problem all together
and make the solution useless.
Best,
Dana
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
_______________________________________________
http://x-query.com/mailman/listinfo/talk
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk

_______________________________________________
***@x-query.com
http://x-query.com/mail
Pavel Velikhov
2016-01-28 12:28:12 UTC
Permalink
Post by Adam Retter
Okay thanks to everyone for the replies.
I think the scope of what I was looking for got thrown out in all the
discussion, or perhaps I wasn't clear enough early in my initial post.
I am not interested in XQuery, and I am only interested in a small
subset of XPath. I am concerned with comparing two path expressions,
these path expressions may have predicates, what I didn't make clear
is that those predicates may not contain functions, only the
comparison operators eq, ne, gt, ge, lt, le, and =, !=, >, >=, <, <=.
So you need an algorithm to test subsumption of path expressions (with the given limitations)?
Or just that one case?
_______________________________________________
***@x-query.com
http://x-query.com/mailman/listinfo/talk
Adam Retter
2016-01-28 12:35:16 UTC
Permalink
Post by Pavel Velikhov
So you need an algorithm to test subsumption of path expressions (with the given limitations)?
Yes.
Post by Pavel Velikhov
Or just that one case?
No I want to cover any case where the set which would be selected by
the path expression is statically known (i.e. no function calls).
--
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
_______________________________________________
***@x-query.com
http://x-query.com/mailman/listinfo/talk
W.S. Hager
2016-01-28 12:37:31 UTC
Permalink
@Pavel do you mean the link I send before?
http://www.w3.org/TR/xquery-semantics

What would be the difference between a proof and a test, in this case?
Post by Pavel Velikhov
Post by Pavel Velikhov
So you need an algorithm to test subsumption of path expressions (with
the given limitations)?
Yes.
Post by Pavel Velikhov
Or just that one case?
No I want to cover any case where the set which would be selected by
the path expression is statically known (i.e. no function calls).
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
Pavel Velikhov
2016-01-28 12:47:00 UTC
Permalink
@Pavel do you mean the link I send before? http://www.w3.org/TR/xquery-semantics <http://www.w3.org/TR/xquery-semantics>
Yes! That’s it.
What would be the difference between a proof and a test, in this case?
Well, proving a specific case is usually very easy, assuming the proof isn’t too formal.

A test is a general procedure. Shouldn’t be that hard, unless we try to prove its correctness
(and depending on how formal we want to proof to be).
Post by Pavel Velikhov
So you need an algorithm to test subsumption of path expressions (with the given limitations)?
Yes.
Post by Pavel Velikhov
Or just that one case?
No I want to cover any case where the set which would be selected by
the path expression is statically known (i.e. no function calls).
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk <http://www.adamretter.org.uk/>
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl <http://lagua.nl/>
daniela florescu
2016-02-03 21:37:44 UTC
Permalink
Adam,

write me directly and I’ll give you an algorithm that guarantees sufficient conditions for subsumption, given a
large enough subset of Path.


Best
Dana
Post by Pavel Velikhov
@Pavel do you mean the link I send before? http://www.w3.org/TR/xquery-semantics <http://www.w3.org/TR/xquery-semantics>
Yes! That’s it.
What would be the difference between a proof and a test, in this case?
Well, proving a specific case is usually very easy, assuming the proof isn’t too formal.
A test is a general procedure. Shouldn’t be that hard, unless we try to prove its correctness
(and depending on how formal we want to proof to be).
Post by Pavel Velikhov
So you need an algorithm to test subsumption of path expressions (with the given limitations)?
Yes.
Post by Pavel Velikhov
Or just that one case?
No I want to cover any case where the set which would be selected by
the path expression is statically known (i.e. no function calls).
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk <http://www.adamretter.org.uk/>
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl <http://lagua.nl/>
Pavel Velikhov
2016-01-28 13:09:33 UTC
Permalink
Post by Adam Retter
Post by Pavel Velikhov
So you need an algorithm to test subsumption of path expressions (with the given limitations)?
Yes.
Post by Pavel Velikhov
Or just that one case?
No I want to cover any case where the set which would be selected by
the path expression is statically known (i.e. no function calls).
I think this should work:

So suppose you have 2 path expressions, p1 and p2, and you allow only /x or //x as path steps
and each path step can have a predicate:

p1 = axis_1 pred_1, …, axis_n pred_n
p2 = axis_1 pred_1, … , axis_m pred_m

subsumes(p1, p2):
/* recursion base case */
if p1 is empty: return true
if p2 is empty: return false

a_1 = first axis of p1
a_2 = first axis of p2

if a_1 is a child axis ‘/‘ label l :
if a_2 is a child axis ‘/' with the same label l, and the predicates are equivalent:
return subsumes(p1 - a_1, p2 - a_2)
else:
return false

if a_1 is a descendants axis ‘//‘ with label l:
if p2 contains an axis with label l:
p2’ = remove all axes of p2 from the head, until you find an axis with label l
p2’ = remove the axis with label l from p2'
return subsumes( p_1 - a_1, p2’ )

return false
Post by Adam Retter
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
_______________________________________________
***@x-query.com
http://x-query.com/mailman/listinfo/ta
Pavel Velikhov
2016-01-28 13:17:01 UTC
Permalink
No, the first line in the base case should also return false, sorry
Post by Pavel Velikhov
So suppose you have 2 path expressions, p1 and p2, and you allow only /x or //x as path steps
p1 = axis_1 pred_1, …, axis_n pred_n
p2 = axis_1 pred_1, … , axis_m pred_m
/* recursion base case */
if p1 is empty: return true
if p2 is empty: return false
a_1 = first axis of p1
a_2 = first axis of p2
return subsumes(p1 - a_1, p2 - a_2)
return false
p2’ = remove all axes of p2 from the head, until you find an axis with label l
p2’ = remove the axis with label l from p2'
return subsumes( p_1 - a_1, p2’ )
return false
_______________________________________________
***@x-query.com
http://x-query.com/mailm
Ghislain Fourny
2016-01-28 13:44:47 UTC
Permalink
Hi Adam,

This may sound stupid, but: have you considered that (and would it be
relevant to you that) the second expression may raise errors that the first
one wouldn't have raised, rather than returning a subset?

Consider the case where b has a numeric type, and the attribute a cannot be
cast to double?

Kind regards,
Ghislain
Post by Pavel Velikhov
No, the first line in the base case should also return false, sorry
Post by Pavel Velikhov
So suppose you have 2 path expressions, p1 and p2, and you allow only /x
or //x as path steps
Post by Pavel Velikhov
p1 = axis_1 pred_1, 
, axis_n pred_n
p2 = axis_1 pred_1, 
 , axis_m pred_m
/* recursion base case */
if p1 is empty: return true
if p2 is empty: return false
a_1 = first axis of p1
a_2 = first axis of p2
if a_2 is a child axis ‘/' with the same label l, and the predicates
return subsumes(p1 - a_1, p2 - a_2)
return false
p2’ = remove all axes of p2 from the head, until you find an axis
with label l
Post by Pavel Velikhov
p2’ = remove the axis with label l from p2'
return subsumes( p_1 - a_1, p2’ )
return false
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Adam Retter
2016-01-28 13:56:23 UTC
Permalink
Hi Ghislain,

No I have not considered error cases, for the moment it is enough for
me to just raise an error if the path is invalid. Although in this
case I can assume that all paths I am provided are correct.
Post by W.S. Hager
Hi Adam,
This may sound stupid, but: have you considered that (and would it be
relevant to you that) the second expression may raise errors that the first
one wouldn't have raised, rather than returning a subset?
Consider the case where b has a numeric type, and the attribute a cannot be
cast to double?
Kind regards,
Ghislain
Post by Pavel Velikhov
No, the first line in the base case should also return false, sorry
Post by Pavel Velikhov
So suppose you have 2 path expressions, p1 and p2, and you allow only /x
or //x as path steps
p1 = axis_1 pred_1, …, axis_n pred_n
p2 = axis_1 pred_1, … , axis_m pred_m
/* recursion base case */
if p1 is empty: return true
if p2 is empty: return false
a_1 = first axis of p1
a_2 = first axis of p2
if a_2 is a child axis ‘/' with the same label l, and the predicates
return subsumes(p1 - a_1, p2 - a_2)
return false
p2’ = remove all axes of p2 from the head, until you find an axis
with label l
p2’ = remove the axis with label l from p2'
return subsumes( p_1 - a_1, p2’ )
return false
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
--
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk

_______________________________________________
***@x-query.com
http://x-query.com/mailm
Pavel Velikhov
2016-01-28 16:20:34 UTC
Permalink
Hmm.. testing subsumption is not as trivial as I thought. Here's a cleaner
way to do it: represent both path expressions as DFAs and then test
subsumption. You'll get DFAs A and B and then test if minimized(B-A) is
empty.
W.S. Hager
2016-01-29 17:53:36 UTC
Permalink
This is the best I (actually Google) can do.

Proof:
https://coq.inria.fr/library/Coq.MSets.MSetGenTree.html

Test:

http://www.geeksforgeeks.org/check-if-a-binary-tree-is-subtree-of-another-binary-tree/

At some point I'll look into this myself, it's a nice challenge.
Post by Pavel Velikhov
Hmm.. testing subsumption is not as trivial as I thought. Here's a cleaner
way to do it: represent both path expressions as DFAs and then test
subsumption. You'll get DFAs A and B and then test if minimized(B-A) is
empty.
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
Pavel Velikhov
2016-01-29 22:29:16 UTC
Permalink
Trees won’t help much here. The difficulty with path expression is due to the // step, which adds non-determinism.

//w/x subsumes /w/x/y/z/w/x
//w/w doesn’t subsume /w/x/w

Regular languages also have determinism and all the algorithms for them are exceptionally well-studied.
Or you can do an algorithm with backtracking, but it could get hairy.
Post by W.S. Hager
This is the best I (actually Google) can do.
https://coq.inria.fr/library/Coq.MSets.MSetGenTree.html <https://coq.inria.fr/library/Coq.MSets.MSetGenTree.html>
http://www.geeksforgeeks.org/check-if-a-binary-tree-is-subtree-of-another-binary-tree/ <http://www.geeksforgeeks.org/check-if-a-binary-tree-is-subtree-of-another-binary-tree/>
At some point I'll look into this myself, it's a nice challenge.
Hmm.. testing subsumption is not as trivial as I thought. Here's a cleaner way to do it: represent both path expressions as DFAs and then test subsumption. You'll get DFAs A and B and then test if minimized(B-A) is empty.
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl <http://lagua.nl/>
W.S. Hager
2016-02-01 12:14:27 UTC
Permalink
Like I mentioned, co-inductive types are used to deal with non-determinism
in formal proofs, although I don't know enough about the subject to give an
example.
Post by Pavel Velikhov
Trees won’t help much here. The difficulty with path expression is due to
the // step, which adds non-determinism.
//w/x subsumes /w/x/y/z/w/x
//w/w doesn’t subsume /w/x/w
Regular languages also have determinism and all the algorithms for them
are exceptionally well-studied.
Or you can do an algorithm with backtracking, but it could get hairy.
This is the best I (actually Google) can do.
https://coq.inria.fr/library/Coq.MSets.MSetGenTree.html
http://www.geeksforgeeks.org/check-if-a-binary-tree-is-subtree-of-another-binary-tree/
At some point I'll look into this myself, it's a nice challenge.
Post by Pavel Velikhov
Hmm.. testing subsumption is not as trivial as I thought. Here's a
cleaner way to do it: represent both path expressions as DFAs and then test
subsumption. You'll get DFAs A and B and then test if minimized(B-A) is
empty.
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
Adam Retter
2016-02-21 19:57:16 UTC
Permalink
Okay so for those that are interested, here is the solution that I
ended up with:

1) I wrote my own XPath 2 parser as I wanted a simplified AST to start
operating from. The project I am working on needs to be under GPL2 and
I could not find a decent Java library that was compatible with that
license - https://github.com/exquery/xpath2-parser/

2) I then wrote some utility functions for descending through the Path
expressions and making subset comparisons -
https://github.com/adamretter/TEI-Completer/blob/master/src/main/java/org/humanistika/oxygen/tei/completer/XPathUtil.java#L79
Yes, this has many limitations and is most likely not complete yet,
but it serves me well for the small subset of XPath path expressions
that I want to support.

3) You can see in my test-cases how I consider one path expression as
a subset of another path expression:
https://github.com/adamretter/TEI-Completer/blob/master/src/test/java/org/humanistika/oxygen/tei/completer/XPathUtilTest.java#L40

Maybe that is of interest to some of you...

Cheers
Post by Adam Retter
Okay thanks to everyone for the replies.
I think the scope of what I was looking for got thrown out in all the
discussion, or perhaps I wasn't clear enough early in my initial post.
I am not interested in XQuery, and I am only interested in a small
subset of XPath. I am concerned with comparing two path expressions,
these path expressions may have predicates, what I didn't make clear
is that those predicates may not contain functions, only the
comparison operators eq, ne, gt, ge, lt, le, and =, !=, >, >=, <, <=.
1. //w
In this instance I can actually ignore the predicate, as I can see
that /x/y/z/w would already produce the subset of //w and the
predicate only serves to restrict that further.
I am not looking for a formal proof in any sense. I am looking for
something practical that I can use in code.
account that the filter can only contain standard operators (eq, gt, lt, etc
as defined in the op namespace)?
Okay, now we are close! However now formal do you want the proof to be? I
The first path expression selects some subset of the children of the node
that it applies, where the label is 'w'. The second one selects all
descendants with label 'w', hence it's result contains all nodes of the
first path expression.
But if you want to go formal, you need to use the semantics of XQuery path
expressions, as well as the formal specification of the data model. There
used to be a formal document on XQuery data model.
Post by Christian Grün
Post by W.S. Hager
Well, so, to continue, let's assume that there are no user-defined
functions, and in fact the only thing we want to proof is select+filter,
where a filter is limited to the default operators. From that is it follows
that
select-child-nodes-by-name(select-child-nodes-by-name($context,'x'),'w')
-path2: select-descendant-nodes-by-name($context,'w')
Just to complete this: The predicate must not be numeric (//w[1] is
not equivalent to /descendant::w[1]).
Post by W.S. Hager
Post by Pavel Velikhov
Post by Michael Kay
It seems to be a long-standing tradition that computer scientists, when
asked to prove a difficult conjecture C, respond by giving a proof for a
simplified conjecture C'. While this might lead to progress in the long run,
and enables them to get papers published in the academic literature, it is
totally useless to practical engineeers who want to know whether they can
safely rely on C.
Michael,
what you say is nice and true.
1. path expressions point (syntactically hence esemantically) into
XQuery’s expressions
2. XQuery expression language is Turing complete
3. Subsumption for a Turing complete language is undecidable.
Well, I can hardly see a way to decide this problem other then by
introducing SOME restrictions
of some sort… but of course some restrictions that would not nullify the
original problem all together
and make the solution useless.
Best,
Dana
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
_______________________________________________
http://x-query.com/mailman/listinfo/talk
--
W.S. Hager
Lagua Web Solutions
http://lagua.nl
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
--
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk

_______________________________________________
***@x-query.com
http://x-query.com/mailman/list
daniela florescu
2016-01-27 19:40:22 UTC
Permalink
Well, so, to continue, let's assume that there are no user-defined functions,
The Turing completeness of XQuery doesn’t come from the user defined functions.

It’s intrinsic to the rest of XQuery.

Even without user defined functions, XQuery is still Turing complete.

Dana
Loading...