So after a bit more coffee and a bit of research it seems to me that
the only way you are going to get this to be fast would be if you used
a hash based looked for one of your sequences. Something like a
HashMap or BloomFilter would do the job, see:
http://stackoverflow.com/questions/4261619/fastest-set-operations-in-the-west
I have made some experimentations with the above in Scala, a HashMap
works nicely here in-terms of performance because it is simple and
available, and in your case with Integers you do not need to worry
about duplicates because the value of a number is it's identity.
So, if you don't want to implement a HashMap or BloomFilter in XQuery,
what is one to do? Well XQuery 3.1 will introduce the Map data-type
and some implementations already have done this. If you understand the
implementation (or even hazard a guess), there is a good chance that
that XQuery map may in-fact under the covers be a HashMap.
I won't say too much more about this as I have been discussing it with
Wolfgang this morning, and I think he will shortly post you a very
fast example when using XQuery 3.1 Maps in eXist...
Post by Adam RetterPost by Adam RetterI think that question is very implementation specific. If all of your
data is in RAM, as your dataset is relatively small and these are just
numbers, I would expect performance to be excellent. I am not sure
that sorting will make much of a difference, but it depends on the
implementation and how it initiates the search for a false comparison
in a large sequence.
I retract any previous statements that may have alluded to good
performance. A quick test on Saxon and eXist shows that this is a very
slow problem. I am trying to see if there is not a short-cut that
could be taken, I will come back to you...
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk