Discussion:
[xquery-talk] Random number generation : requirements
Michael Kay
2014-05-06 16:53:47 UTC
Permalink
The W3C WGs are looking at the idea of introducing a random-number function of some kind in XPath 3.1. The challenge of course is making this both usable and a pure function with no side-effects. We have various design ideas which we need to test against requirements.

If you have any applications that use or need such a function, please could I have a brief description of the way it uses random numbers, e.g. just wanting a single random number, a random permutation of 52 integers, an arbitrary sequence of random numebrs for test data generation, etc.

If you're currently using the EXSLT random-sequence() function, please share your experience with it.

Michael Kay
Saxonica
Matthias Brantner
2014-05-06 17:42:23 UTC
Permalink
Michael

Here is the documentation of a module containing functions to generate
random numbers or strings with Zorba.

http://www.zorba.io/documentation/latest/modules/zorba/xdm/atomic/random

Please note that some functions are using Zorba's nondeterministic annotation.
Also note, that there is one function that creates uuids returned as string.

The functions have proven themselves useful in plenty of scenarios like
password or web session token generation.

I hope you find this module useful.

Matthias
Post by Michael Kay
The W3C WGs are looking at the idea of introducing a random-number function of some kind in XPath 3.1. The challenge of course is making this both usable and a pure function with no side-effects. We have various design ideas which we need to test against requirements.
If you have any applications that use or need such a function, please could I have a brief description of the way it uses random numbers, e.g. just wanting a single random number, a random permutation of 52 integers, an arbitrary sequence of random numebrs for test data generation, etc.
If you're currently using the EXSLT random-sequence() function, please share your experience with it.
Michael Kay
Saxonica
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Christian Grün
2014-05-06 17:51:21 UTC
Permalink
And here is the Random Module of BaseX:

http://docs.basex.org/wiki/Random_Module

Similar to Zorba, most functions are non-deterministic. Is there any
particular reason why the official random functions need to be
deterministic?

Hope this helps,
Christian


On Tue, May 6, 2014 at 7:42 PM, Matthias Brantner
Post by Michael Kay
Michael
Here is the documentation of a module containing functions to generate
random numbers or strings with Zorba.
http://www.zorba.io/documentation/latest/modules/zorba/xdm/atomic/random
Please note that some functions are using Zorba's nondeterministic annotation.
Also note, that there is one function that creates uuids returned as string.
The functions have proven themselves useful in plenty of scenarios like
password or web session token generation.
I hope you find this module useful.
Matthias
Post by Michael Kay
The W3C WGs are looking at the idea of introducing a random-number function of some kind in XPath 3.1. The challenge of course is making this both usable and a pure function with no side-effects. We have various design ideas which we need to test against requirements.
If you have any applications that use or need such a function, please could I have a brief description of the way it uses random numbers, e.g. just wanting a single random number, a random permutation of 52 integers, an arbitrary sequence of random numebrs for test data generation, etc.
If you're currently using the EXSLT random-sequence() function, please share your experience with it.
Michael Kay
Saxonica
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Michael Kay
2014-05-06 20:42:29 UTC
Permalink
Post by Christian Grün
Similar to Zorba, most functions are non-deterministic. Is there any
particular reason why the official random functions need to be
deterministic?
Yes, we don't have any machinery in the language semantics for declaring something as nondeterministic (easy to solve) or defining the semantics of how it should behave if thus annotated (much harder).

For example, if f() and g() are non-deterministic functions, how do we say in the spec that in the expression (f(), g()), f should be executed before g?

Michael Kay
Saxonica
Christian Grün
2014-05-06 20:52:42 UTC
Permalink
Post by Michael Kay
Yes, we don't have any machinery in the language semantics for declaring something as nondeterministic (easy to solve) or defining the semantics of how it should behave if thus annotated (much harder).
For example, if f() and g() are non-deterministic functions, how do we say in the spec that in the expression (f(), g()), f should be executed before g?
I see; so it seems to be related to questions (like the execution
order) that have also been discussed on the File Module

Thanks for the info,
Christian
Adam Retter
2014-05-07 18:05:28 UTC
Permalink
Just to throw in the functions from eXist which are also non-deterministic -

http://www.exist-db.org/exist/apps/fundocs/index.html?action=search&type=name&q=random

Personally I would be more interested in having a good random number
source for generating strong UUIDs, at the moment we just have an
extension function which is a non-deterministic wrapper around some
Java - http://www.exist-db.org/exist/apps/fundocs/index.html?action=search&type=name&q=uuid
Post by Christian Grün
http://docs.basex.org/wiki/Random_Module
Similar to Zorba, most functions are non-deterministic. Is there any
particular reason why the official random functions need to be
deterministic?
Hope this helps,
Christian
On Tue, May 6, 2014 at 7:42 PM, Matthias Brantner
Post by Michael Kay
Michael
Here is the documentation of a module containing functions to generate
random numbers or strings with Zorba.
http://www.zorba.io/documentation/latest/modules/zorba/xdm/atomic/random
Please note that some functions are using Zorba's nondeterministic annotation.
Also note, that there is one function that creates uuids returned as string.
The functions have proven themselves useful in plenty of scenarios like
password or web session token generation.
I hope you find this module useful.
Matthias
Post by Michael Kay
The W3C WGs are looking at the idea of introducing a random-number function of some kind in XPath 3.1. The challenge of course is making this both usable and a pure function with no side-effects. We have various design ideas which we need to test against requirements.
If you have any applications that use or need such a function, please could I have a brief description of the way it uses random numbers, e.g. just wanting a single random number, a random permutation of 52 integers, an arbitrary sequence of random numebrs for test data generation, etc.
If you're currently using the EXSLT random-sequence() function, please share your experience with it.
Michael Kay
Saxonica
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
--
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
Benito van der Zander
2014-05-06 22:31:51 UTC
Permalink
Hi,
a few days ago I also added a random number function to my XQuery engine:

https://sourceforge.net/p/videlibri/code/ci/bb89762402adad7d2f6a8b819f84fdd3267816ed/

Non-determistic like the other ones...


My policy on side effects is: all expressions containing side effects
are going to be evaluated in order


Bye,
Benito
Post by Michael Kay
The W3C WGs are looking at the idea of introducing a random-number function of some kind in XPath 3.1. The challenge of course is making this both usable and a pure function with no side-effects. We have various design ideas which we need to test against requirements.
If you have any applications that use or need such a function, please could I have a brief description of the way it uses random numbers, e.g. just wanting a single random number, a random permutation of 52 integers, an arbitrary sequence of random numebrs for test data generation, etc.
If you're currently using the EXSLT random-sequence() function, please share your experience with it.
Michael Kay
Saxonica
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Michael Kay
2014-05-06 22:40:26 UTC
Permalink
My policy on side effects is: all expressions containing side effects are going to be evaluated in order
I do something like that in Saxon as well. But I don't attempt to define what "in order" means; for example, the order in which different global variables are evaluated. Doing this in the spec would be much more problematic.

Michael Kay
Saxonica
Michael Sokolov
2014-05-06 22:48:18 UTC
Permalink
Post by Michael Kay
My policy on side effects is: all expressions containing side effects are going to be evaluated in order
I do something like that in Saxon as well. But I don't attempt to define what "in order" means; for example, the order in which different global variables are evaluated. Doing this in the spec would be much more problematic.
You don't think it would be reasonable to say something to the effect
that the order in which non-deterministic expressions are evaluated is
non-deterministic (ie implementation-defined)? Certainly it would be
reasonable enough in the case of a random number generator. Although I
suppose if you are going to seed it, you would like the seed to effect
the random numbers that are generated.

-Mike
Michael Kay
2014-05-06 22:58:43 UTC
Permalink
The big problem with a nondeterministic random() function is not defining the order of execution, but preventing it being optimised out of a loop. For example, how do we ensure that

$xxx[random() gt 0.5]

doesn't select either all the values or none?

Anyway, we're not planning to do non-determinism. This exercise is about designing a deterministic way to meet the requirement.

Michael Kay
Saxonica
Post by Michael Kay
My policy on side effects is: all expressions containing side effects are going to be evaluated in order
I do something like that in Saxon as well. But I don't attempt to define what "in order" means; for example, the order in which different global variables are evaluated. Doing this in the spec would be much more problematic.
You don't think it would be reasonable to say something to the effect that the order in which non-deterministic expressions are evaluated is non-deterministic (ie implementation-defined)? Certainly it would be reasonable enough in the case of a random number generator. Although I suppose if you are going to seed it, you would like the seed to effect the random numbers that are generated.
-Mike
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Michael Sokolov
2014-05-06 23:15:56 UTC
Permalink
My 2c:

I used an XQuery function based on Dmitry's version before; it works
fine although it's a little inconvenient to have to keep passing in the
prior value.

I would say the most convenient (or at least the most familiar)
signature for a random function is random($n) returning a random number
between 0 inclusive and $n exclusive; ideally it would return integers
if $n is an integer, floating point numbers if $n is a floating point
number, empty if $n is empty ? and an error otherwise. And I would like
a seed function. Ideally this should be callable many times: I'm not
sure how that could be done non-deterministically though.

I suppose a sequence would be useful, but it isn't the first thing that
leaps to mind. What if I'm not sure how many I'll need?

For example, one use case for me was to load a huge amount of data, and
only include 1% of it, in order to generate a predictable test data
sub-set. I want to write an XSLT template that returns nothing 99% of
the time, and for the other 1% of the time it processed the content
normally. I want this to be based on an identifier in the content so
that for a given seed, the same "random" 1% are selected each time: it
should *not* be order-dependent, rather I would like to seed the random
number generator with a hash of a given seed that is a configuration
parameter, and a node-identifier, and then evaluate the next random
number to see if it is > 0.01 (say). Maybe there are other ways to do
that, but that is what I did using Java.

-Mike
Post by Michael Kay
The big problem with a nondeterministic random() function is not defining the order of execution, but preventing it being optimised out of a loop. For example, how do we ensure that
$xxx[random() gt 0.5]
doesn't select either all the values or none?
Anyway, we're not planning to do non-determinism. This exercise is about designing a deterministic way to meet the requirement.
Michael Kay
Saxonica
Post by Michael Kay
My policy on side effects is: all expressions containing side effects are going to be evaluated in order
I do something like that in Saxon as well. But I don't attempt to define what "in order" means; for example, the order in which different global variables are evaluated. Doing this in the spec would be much more problematic.
You don't think it would be reasonable to say something to the effect that the order in which non-deterministic expressions are evaluated is non-deterministic (ie implementation-defined)? Certainly it would be reasonable enough in the case of a random number generator. Although I suppose if you are going to seed it, you would like the seed to effect the random numbers that are generated.
-Mike
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Benito van der Zander
2014-05-07 20:29:26 UTC
Permalink
Hi Michael,
perhaps it is time to add monads to XPath?

They should solve this issue and all related ones



Best,
Benito
Post by Michael Kay
The big problem with a nondeterministic random() function is not defining the order of execution, but preventing it being optimised out of a loop. For example, how do we ensure that
$xxx[random() gt 0.5]
doesn't select either all the values or none?
Anyway, we're not planning to do non-determinism. This exercise is about designing a deterministic way to meet the requirement.
Michael Kay
Saxonica
Post by Michael Kay
My policy on side effects is: all expressions containing side effects are going to be evaluated in order
I do something like that in Saxon as well. But I don't attempt to define what "in order" means; for example, the order in which different global variables are evaluated. Doing this in the spec would be much more problematic.
You don't think it would be reasonable to say something to the effect that the order in which non-deterministic expressions are evaluated is non-deterministic (ie implementation-defined)? Certainly it would be reasonable enough in the case of a random number generator. Although I suppose if you are going to seed it, you would like the seed to effect the random numbers that are generated.
-Mike
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Ghislain Fourny
2014-05-08 13:34:14 UTC
Permalink
Hi Benito,

We had a discussion on monads not so long ago in the context of update
and scripting. I like the elegance of monads. It is my understanding.
though, that monads only defer side effects to until they are applied
-- and then you are back to the need for side effects semantics.

Somehow, side effects and their visibility to subsequent expressions
bring an irreducible amount of complexity to the design of the
language (evaluation order, snapshots, etc). I don' t think there is
an easy way to introduce them -- clean and elegant, maybe, but not
easy, or at least, I believe, not easier than what was done in the
current scripting draft or alternate implementations.

That said, I think it is an important point that non-determinism is
distinct from side effects. You can have deterministic side effects,
and you can have non-determinism without side effects (as I tried to
illustrate in my former e-mail). So far, I am not convinced that
introducing side effects would address non-determinism.

Does it make sense?

Kind regards,
Ghislain


On Wed, May 7, 2014 at 10:29 PM, Benito van der Zander
Post by Benito van der Zander
Hi Michael,
perhaps it is time to add monads to XPath?
They should solve this issue and all related ones
Best,
Benito
The big problem with a nondeterministic random() function is not defining
the order of execution, but preventing it being optimised out of a loop. For
example, how do we ensure that
$xxx[random() gt 0.5]
doesn't select either all the values or none?
Anyway, we're not planning to do non-determinism. This exercise is about
designing a deterministic way to meet the requirement.
Michael Kay
Saxonica
My policy on side effects is: all expressions containing side effects are
going to be evaluated in order
I do something like that in Saxon as well. But I don't attempt to define
what "in order" means; for example, the order in which different global
variables are evaluated. Doing this in the spec would be much more
problematic.
You don't think it would be reasonable to say something to the effect that
the order in which non-deterministic expressions are evaluated is
non-deterministic (ie implementation-defined)? Certainly it would be
reasonable enough in the case of a random number generator. Although I
suppose if you are going to seed it, you would like the seed to effect the
random numbers that are generated.
-Mike
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Leonard Wörteler
2014-05-06 23:20:13 UTC
Permalink
Hello Michael,
Post by Michael Kay
If you have any applications that use or need such a function, please could I have a brief description of the way it uses random numbers, e.g. just wanting a single random number, a random permutation of 52 integers, an arbitrary sequence of random numebrs for test data generation, etc.
in contrast to everyone else (it seems) I implemented a *deterministic*
random-number generator in XQuery [1] in order to generate reproducible
fuzzy tests for my higher-order xq-modules project [2].

I used a simple Linear Congruential Generator [3] and implemented the
following functions:

rng:with-random-ints(
$seed as xs:integer,
$n as xs:integer,
$start as item()*,
$f as function(item()*, item()*) as item()*
)

realizes a constant-space fold over `$n` randomly generated `xs:int`s
with initial seed `$seed`. This is used by

rng:random-ints(
$seed as xs:integer,
$n as xs:integer
) as xs:int* {
rng:with-random-ints($seed, $n, (), function($seq, $i) {
($seq, $i)
})
};

to generate a sequence of random `xs:int` values. The module is used in
some tests [4,5].

You could obviously also implement `rng:with-random-ints(...)` in terms
of `rng:random-ints(...)`:

rng:with-random-ints($seed, $n, $start, $f) {
fn:fold-left(rng:random-ints($seed, $n), $start, $f)
};

-- Leo

[1]
https://github.com/LeoWoerteler/xq-modules/blob/master/src/main/xquery/modules/rng.xqm
[2] https://github.com/LeoWoerteler/xq-modules
[3] http://en.wikipedia.org/wiki/Linear_congruential_generator
[4]
https://github.com/LeoWoerteler/xq-modules/blob/master/src/test/xquery/int-set-test.xq#L33-51
[5]
https://github.com/LeoWoerteler/xq-modules/blob/master/src/test/xquery/heap-test.xq#L10
Ghislain Fourny
2014-05-07 11:13:29 UTC
Permalink
Hi,

I think that Leo' s proposal, on a first read, is the most elegant way
I have seen so far that (pseudo-)randomness can be brought to XQuery
in a deterministic way: by exposing that it is actually
pseudo-randomness (seed) to the user.

In Zorba (not 3.0, but the latest, trunk revision, used by us), we
designed and implemented nondeterminism in a way fully integrated with
side effects. If this is of interest to the list(s), I can say a few
words about what we did. Of course, feedback is always appreciated.

The abstract model is as follows. Basically, a program is executed
against a sequence of snapshots. Each side effect jumps to a new
snapshot. Between snapshots, evaluation order is very important
("sequential"). Between side effects and within a snapshot though, it
is as if the engine were so fast that time did not elapse at all. Of
course no engine is that fast -- even though I could imagine Mike
proving me wrong here :-) -- but, joke aside. it can hide the elapse
of time from the user with, say, caching. This idea is crucial in our
design and was largely inspired by Michael Sperberg-McQueen's vision
in a discussion on functional languages a while ago.

For everything deterministic, this means that identical expressions
within the same snapshot and against the same dynamic context (mostly
variable bindings) always evaluate to the same result (except in the
case of newly created nodes where identity is new every time). Once a
side effect occurs and a new snapshot is reached, the result can again
be different (the cache gets reset).

By default, Zorba automatically caches every function call involving
atomics only, within a snapshot. If you ask many times for the current
time without side effects, it should give you the same answer. You can
also ask Zorba to "simulate" or "force" full determinism with XML or
JSON as well.

If a function, or transitively an expression, is stamped as
nondeterministic, it means in Zorba that it has to be evaluated every
time the specification semantics says it has to be. Within a snapshot
though, there is no evaluation order constraint and the implementation
is free to pick any order it wants.
Post by Michael Kay
For example, if f() and g() are non-deterministic functions, how do we say in the spec that in the expression (f(), g()), f should be executed before g?
In Zorba, we don' t. It' s fine to evaluate g before f.
Post by Michael Kay
$xxx[random() gt 0.5]
With our semantics, this should produce a new number every time,
because the spec says the predicate is evaluated for each item in the
lhs sequence. Evaluating the predicate only once is an optimization
that cannot be done here because of non-determinism.

On an abstract level, nondeterministim, to us (Zorba), is synonym with
the presence of (true/quantum) randomness, nothing less but also
nothing more.

Does it make sense? I hope I got the summary right.

Kind regards,
Ghislain
Post by Michael Kay
Hello Michael,
Post by Michael Kay
If you have any applications that use or need such a function, please
could I have a brief description of the way it uses random numbers, e.g.
just wanting a single random number, a random permutation of 52 integers, an
arbitrary sequence of random numebrs for test data generation, etc.
in contrast to everyone else (it seems) I implemented a *deterministic*
random-number generator in XQuery [1] in order to generate reproducible
fuzzy tests for my higher-order xq-modules project [2].
I used a simple Linear Congruential Generator [3] and implemented the
rng:with-random-ints(
$seed as xs:integer,
$n as xs:integer,
$start as item()*,
$f as function(item()*, item()*) as item()*
)
realizes a constant-space fold over `$n` randomly generated `xs:int`s with
initial seed `$seed`. This is used by
rng:random-ints(
$seed as xs:integer,
$n as xs:integer
) as xs:int* {
rng:with-random-ints($seed, $n, (), function($seq, $i) {
($seq, $i)
})
};
to generate a sequence of random `xs:int` values. The module is used in some
tests [4,5].
You could obviously also implement `rng:with-random-ints(...)` in terms of
rng:with-random-ints($seed, $n, $start, $f) {
fn:fold-left(rng:random-ints($seed, $n), $start, $f)
};
-- Leo
[1]
https://github.com/LeoWoerteler/xq-modules/blob/master/src/main/xquery/modules/rng.xqm
[2] https://github.com/LeoWoerteler/xq-modules
[3] http://en.wikipedia.org/wiki/Linear_congruential_generator
[4]
https://github.com/LeoWoerteler/xq-modules/blob/master/src/test/xquery/int-set-test.xq#L33-51
[5]
https://github.com/LeoWoerteler/xq-modules/blob/master/src/test/xquery/heap-test.xq#L10
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Loading...