Discussion:
[xquery-talk] Multiple output via a stream of filters
Ihe Onwuka
2014-01-13 19:11:22 UTC
Permalink
I am running through about a gigabyte worth of xml documents.

The ideal processing scenario is to offer each node in the sequence to
a list of filters and augment different XML documents (or different
branches of one encompassing document) based on the outcome of the
filter.

If anyone has seen the example used to illustrate continuation passing
style in Chapter 8 of the Little Schemer that is exactly what I have
in mind (albeit not necessarily in continuation passing style).

What I am doing at the moment is cycling through the nodes n times
where n is the number of filters I am applying. Clearly sub-optimal.
However it is not a priority to what I am actually doing (which is
simply to get the result rather than to do so performantly) so I am
not quite motivated enough to figure out how to do this.

Hence I am asking instead what others have done in a similar scenario.
I suppose some sort of customised HOF entailing head/tail recursion
over the sequence and accepting a list of filter functions, would be
the likely form a solution would take.
David Lee
2014-01-13 20:40:11 UTC
Permalink
This is the type of problem xmlsh and XProc were designed for ...
What engine are you using? I personally prefer designing with lots of small programs instead of a monolith. This is practical only if the startup overhead for each is small and preferably if in memory data can be passed between steps. XProc, xmlsh, and most xquery database engines support this model efficiently. I find it so much easier to write and debug if I can work in small transformations and let the framework do the plumbing for me.


Sent from my iPad (excuse the terseness)
David A Lee
***@calldei.com


> On Jan 13, 2014, at 11:12 AM, "Ihe Onwuka" <***@gmail.com> wrote:
>
> I am running through about a gigabyte worth of xml documents.
>
> The ideal processing scenario is to offer each node in the sequence to
> a list of filters and augment different XML documents (or different
> branches of one encompassing document) based on the outcome of the
> filter.
>
> If anyone has seen the example used to illustrate continuation passing
> style in Chapter 8 of the Little Schemer that is exactly what I have
> in mind (albeit not necessarily in continuation passing style).
>
> What I am doing at the moment is cycling through the nodes n times
> where n is the number of filters I am applying. Clearly sub-optimal.
> However it is not a priority to what I am actually doing (which is
> simply to get the result rather than to do so performantly) so I am
> not quite motivated enough to figure out how to do this.
>
> Hence I am asking instead what others have done in a similar scenario.
> I suppose some sort of customised HOF entailing head/tail recursion
> over the sequence and accepting a list of filter functions, would be
> the likely form a solution would take.
> _______________________________________________
> ***@x-query.com
> http://x-query.com/mailman/listinfo/talk
David Lee
2014-01-13 20:43:24 UTC
Permalink
FYI working on build machine and store 2 today ... Let me know when you need it back.
Maybe we need a store3;)


Sent from my iPad (excuse the terseness)
David A Lee
***@calldei.com


> On Jan 13, 2014, at 12:40 PM, "David Lee" <***@calldei.com> wrote:
>
> This is the type of problem xmlsh and XProc were designed for ...
> What engine are you using? I personally prefer designing with lots of small programs instead of a monolith. This is practical only if the startup overhead for each is small and preferably if in memory data can be passed between steps. XProc, xmlsh, and most xquery database engines support this model efficiently. I find it so much easier to write and debug if I can work in small transformations and let the framework do the plumbing for me.
>
>
> Sent from my iPad (excuse the terseness)
> David A Lee
> ***@calldei.com
>
>
>> On Jan 13, 2014, at 11:12 AM, "Ihe Onwuka" <***@gmail.com> wrote:
>>
>> I am running through about a gigabyte worth of xml documents.
>>
>> The ideal processing scenario is to offer each node in the sequence to
>> a list of filters and augment different XML documents (or different
>> branches of one encompassing document) based on the outcome of the
>> filter.
>>
>> If anyone has seen the example used to illustrate continuation passing
>> style in Chapter 8 of the Little Schemer that is exactly what I have
>> in mind (albeit not necessarily in continuation passing style).
>>
>> What I am doing at the moment is cycling through the nodes n times
>> where n is the number of filters I am applying. Clearly sub-optimal.
>> However it is not a priority to what I am actually doing (which is
>> simply to get the result rather than to do so performantly) so I am
>> not quite motivated enough to figure out how to do this.
>>
>> Hence I am asking instead what others have done in a similar scenario.
>> I suppose some sort of customised HOF entailing head/tail recursion
>> over the sequence and accepting a list of filter functions, would be
>> the likely form a solution would take.
>> _______________________________________________
>> ***@x-query.com
>> http://x-query.com/mailman/listinfo/talk
>
> _______________________________________________
> ***@x-query.com
> http://x-query.com/mailman/listinfo/talk
David Lee
2014-01-13 20:48:49 UTC
Permalink
Sorry replied to wrong email
Sent from my iPad (excuse the terseness)
David A Lee
***@calldei.com


> On Jan 13, 2014, at 12:43 PM, "David Lee" <***@calldei.com> wrote:
>
> FYI working on build machine and store 2 today ... Let me know when you need it back.
> Maybe we need a store3;)
>
>
> Sent from my iPad (excuse the terseness)
> David A Lee
> ***@calldei.com
>
>
>> On Jan 13, 2014, at 12:40 PM, "David Lee" <***@calldei.com> wrote:
>>
>> This is the type of problem xmlsh and XProc were designed for ...
>> What engine are you using? I personally prefer designing with lots of small programs instead of a monolith. This is practical only if the startup overhead for each is small and preferably if in memory data can be passed between steps. XProc, xmlsh, and most xquery database engines support this model efficiently. I find it so much easier to write and debug if I can work in small transformations and let the framework do the plumbing for me.
>>
>>
>> Sent from my iPad (excuse the terseness)
>> David A Lee
>> ***@calldei.com
>>
>>
>>> On Jan 13, 2014, at 11:12 AM, "Ihe Onwuka" <***@gmail.com> wrote:
>>>
>>> I am running through about a gigabyte worth of xml documents.
>>>
>>> The ideal processing scenario is to offer each node in the sequence to
>>> a list of filters and augment different XML documents (or different
>>> branches of one encompassing document) based on the outcome of the
>>> filter.
>>>
>>> If anyone has seen the example used to illustrate continuation passing
>>> style in Chapter 8 of the Little Schemer that is exactly what I have
>>> in mind (albeit not necessarily in continuation passing style).
>>>
>>> What I am doing at the moment is cycling through the nodes n times
>>> where n is the number of filters I am applying. Clearly sub-optimal.
>>> However it is not a priority to what I am actually doing (which is
>>> simply to get the result rather than to do so performantly) so I am
>>> not quite motivated enough to figure out how to do this.
>>>
>>> Hence I am asking instead what others have done in a similar scenario.
>>> I suppose some sort of customised HOF entailing head/tail recursion
>>> over the sequence and accepting a list of filter functions, would be
>>> the likely form a solution would take.
>>> _______________________________________________
>>> ***@x-query.com
>>> http://x-query.com/mailman/listinfo/talk
>>
>> _______________________________________________
>> ***@x-query.com
>> http://x-query.com/mailman/listinfo/talk
>
> _______________________________________________
> ***@x-query.com
> http://x-query.com/mailman/listinfo/talk
Ihe Onwuka
2014-01-13 21:12:39 UTC
Permalink
The documents are in an eXist database hence I was expecting and think
I need an XQuery solution but am open to other approaches.

On Mon, Jan 13, 2014 at 8:40 PM, David Lee <***@calldei.com> wrote:
> This is the type of problem xmlsh and XProc were designed for ...
> What engine are you using? I personally prefer designing with lots of small programs instead of a monolith. This is practical only if the startup overhead for each is small and preferably if in memory data can be passed between steps. XProc, xmlsh, and most xquery database engines support this model efficiently. I find it so much easier to write and debug if I can work in small transformations and let the framework do the plumbing for me.
>
>
> Sent from my iPad (excuse the terseness)
> David A Lee
> ***@calldei.com
>
>
>> On Jan 13, 2014, at 11:12 AM, "Ihe Onwuka" <***@gmail.com> wrote:
>>
>> I am running through about a gigabyte worth of xml documents.
>>
>> The ideal processing scenario is to offer each node in the sequence to
>> a list of filters and augment different XML documents (or different
>> branches of one encompassing document) based on the outcome of the
>> filter.
>>
>> If anyone has seen the example used to illustrate continuation passing
>> style in Chapter 8 of the Little Schemer that is exactly what I have
>> in mind (albeit not necessarily in continuation passing style).
>>
>> What I am doing at the moment is cycling through the nodes n times
>> where n is the number of filters I am applying. Clearly sub-optimal.
>> However it is not a priority to what I am actually doing (which is
>> simply to get the result rather than to do so performantly) so I am
>> not quite motivated enough to figure out how to do this.
>>
>> Hence I am asking instead what others have done in a similar scenario.
>> I suppose some sort of customised HOF entailing head/tail recursion
>> over the sequence and accepting a list of filter functions, would be
>> the likely form a solution would take.
>> _______________________________________________
>> ***@x-query.com
>> http://x-query.com/mailman/listinfo/talk
Adam Retter
2014-01-13 23:26:24 UTC
Permalink
Ihe,

I think the XQuery solution is exactly as you described it. A
recursive descent, most likely starting with an identity transform,
and a sequence of functions that can be combined and applied at each
level of the descent.

On 13 January 2014 21:12, Ihe Onwuka <***@gmail.com> wrote:
> The documents are in an eXist database hence I was expecting and think
> I need an XQuery solution but am open to other approaches.
>
> On Mon, Jan 13, 2014 at 8:40 PM, David Lee <***@calldei.com> wrote:
>> This is the type of problem xmlsh and XProc were designed for ...
>> What engine are you using? I personally prefer designing with lots of small programs instead of a monolith. This is practical only if the startup overhead for each is small and preferably if in memory data can be passed between steps. XProc, xmlsh, and most xquery database engines support this model efficiently. I find it so much easier to write and debug if I can work in small transformations and let the framework do the plumbing for me.
>>
>>
>> Sent from my iPad (excuse the terseness)
>> David A Lee
>> ***@calldei.com
>>
>>
>>> On Jan 13, 2014, at 11:12 AM, "Ihe Onwuka" <***@gmail.com> wrote:
>>>
>>> I am running through about a gigabyte worth of xml documents.
>>>
>>> The ideal processing scenario is to offer each node in the sequence to
>>> a list of filters and augment different XML documents (or different
>>> branches of one encompassing document) based on the outcome of the
>>> filter.
>>>
>>> If anyone has seen the example used to illustrate continuation passing
>>> style in Chapter 8 of the Little Schemer that is exactly what I have
>>> in mind (albeit not necessarily in continuation passing style).
>>>
>>> What I am doing at the moment is cycling through the nodes n times
>>> where n is the number of filters I am applying. Clearly sub-optimal.
>>> However it is not a priority to what I am actually doing (which is
>>> simply to get the result rather than to do so performantly) so I am
>>> not quite motivated enough to figure out how to do this.
>>>
>>> Hence I am asking instead what others have done in a similar scenario.
>>> I suppose some sort of customised HOF entailing head/tail recursion
>>> over the sequence and accepting a list of filter functions, would be
>>> the likely form a solution would take.
>>> _______________________________________________
>>> ***@x-query.com
>>> http://x-query.com/mailman/listinfo/talk
>
> _______________________________________________
> ***@x-query.com
> http://x-query.com/mailman/listinfo/talk



--
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
David Lee
2014-01-14 00:03:15 UTC
Permalink
XML Databases tend to be good at this (keeping things cached, avoiding unnecessary parsing and serialization, etc).
If it is a NxM problem that may be fine unless it goes too slow. Then you might want to see about ways of optimization ...
Generally though (not sure about Exist but I suspect it falls in the general guidelines of XML DBs) taking a document and applying multiple functions/xquery/xslt to it is just fine. The system will keep the document in memory and usually the setup time to apply exit one function and apply the next is small compared the the functions. And much easier than one big huge complicated function/transform. XML DB's also tend to cache the functions or xqueries after compilation so they don’t need to recompile them every iteration.



----------------------------------------
David A. Lee
***@calldei.com
http://www.xmlsh.org

-----Original Message-----
From: Adam Retter [mailto:***@googlemail.com]
Sent: Monday, January 13, 2014 3:26 PM
To: Ihe Onwuka
Cc: David Lee; ***@x-query.com
Subject: Re: [xquery-talk] Multiple output via a stream of filters

Ihe,

I think the XQuery solution is exactly as you described it. A recursive descent, most likely starting with an identity transform, and a sequence of functions that can be combined and applied at each level of the descent.

On 13 January 2014 21:12, Ihe Onwuka <***@gmail.com> wrote:
> The documents are in an eXist database hence I was expecting and think
> I need an XQuery solution but am open to other approaches.
>
> On Mon, Jan 13, 2014 at 8:40 PM, David Lee <***@calldei.com> wrote:
>> This is the type of problem xmlsh and XProc were designed for ...
>> What engine are you using? I personally prefer designing with lots of small programs instead of a monolith. This is practical only if the startup overhead for each is small and preferably if in memory data can be passed between steps. XProc, xmlsh, and most xquery database engines support this model efficiently. I find it so much easier to write and debug if I can work in small transformations and let the framework do the plumbing for me.
>>
>>
>> Sent from my iPad (excuse the terseness) David A Lee ***@calldei.com
>>
>>
>>> On Jan 13, 2014, at 11:12 AM, "Ihe Onwuka" <***@gmail.com> wrote:
>>>
>>> I am running through about a gigabyte worth of xml documents.
>>>
>>> The ideal processing scenario is to offer each node in the sequence
>>> to a list of filters and augment different XML documents (or
>>> different branches of one encompassing document) based on the
>>> outcome of the filter.
>>>
>>> If anyone has seen the example used to illustrate continuation
>>> passing style in Chapter 8 of the Little Schemer that is exactly
>>> what I have in mind (albeit not necessarily in continuation passing style).
>>>
>>> What I am doing at the moment is cycling through the nodes n times
>>> where n is the number of filters I am applying. Clearly sub-optimal.
>>> However it is not a priority to what I am actually doing (which is
>>> simply to get the result rather than to do so performantly) so I am
>>> not quite motivated enough to figure out how to do this.
>>>
>>> Hence I am asking instead what others have done in a similar scenario.
>>> I suppose some sort of customised HOF entailing head/tail recursion
>>> over the sequence and accepting a list of filter functions, would
>>> be the likely form a solution would take.
>>> _______________________________________________
>>> ***@x-query.com
>>> http://x-query.com/mailman/listinfo/talk
>
> _______________________________________________
> ***@x-query.com
> http://x-query.com/mailman/listinfo/talk



--
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
David Lee
2014-01-13 23:54:45 UTC
Permalink
If your running in exist then pure XQuery is probably as good or better then anything else.
Could you expand on your problem ? I don't know exist that well but I cant think off hand of a better solution
Unless there is a shortcut to know ahead of time what transforms to apply ...
Do make sure you iterate over the docs *once* then the transforms N times ...
E.g.

For $d in doc()
For $transform in $transforms ...

Not the other way
( i.e. DON'T do
For $n in $transforms
For $d in $doc
...
)

I don't know exist that well but typically once a document is fetched into memory in a XML DB it can stay cached,
But if you are loading too many docs the cache will get full and it will have to reload the docs.

That is assuming that the size of your documents is bigger than the transforms.


----------------------------------------
David A. Lee
***@calldei.com
http://www.xmlsh.org

-----Original Message-----
From: Ihe Onwuka [mailto:***@gmail.com]
Sent: Monday, January 13, 2014 1:13 PM
To: David Lee
Cc: ***@x-query.com
Subject: Re: [xquery-talk] Multiple output via a stream of filters

The documents are in an eXist database hence I was expecting and think I need an XQuery solution but am open to other approaches.

On Mon, Jan 13, 2014 at 8:40 PM, David Lee <***@calldei.com> wrote:
> This is the type of problem xmlsh and XProc were designed for ...
> What engine are you using? I personally prefer designing with lots of small programs instead of a monolith. This is practical only if the startup overhead for each is small and preferably if in memory data can be passed between steps. XProc, xmlsh, and most xquery database engines support this model efficiently. I find it so much easier to write and debug if I can work in small transformations and let the framework do the plumbing for me.
>
>
> Sent from my iPad (excuse the terseness) David A Lee ***@calldei.com
>
>
>> On Jan 13, 2014, at 11:12 AM, "Ihe Onwuka" <***@gmail.com> wrote:
>>
>> I am running through about a gigabyte worth of xml documents.
>>
>> The ideal processing scenario is to offer each node in the sequence
>> to a list of filters and augment different XML documents (or
>> different branches of one encompassing document) based on the outcome
>> of the filter.
>>
>> If anyone has seen the example used to illustrate continuation
>> passing style in Chapter 8 of the Little Schemer that is exactly what
>> I have in mind (albeit not necessarily in continuation passing style).
>>
>> What I am doing at the moment is cycling through the nodes n times
>> where n is the number of filters I am applying. Clearly sub-optimal.
>> However it is not a priority to what I am actually doing (which is
>> simply to get the result rather than to do so performantly) so I am
>> not quite motivated enough to figure out how to do this.
>>
>> Hence I am asking instead what others have done in a similar scenario.
>> I suppose some sort of customised HOF entailing head/tail recursion
>> over the sequence and accepting a list of filter functions, would be
>> the likely form a solution would take.
>> _______________________________________________
>> ***@x-query.com
>> http://x-query.com/mailman/listinfo/talk
Adam Retter
2014-01-14 00:04:50 UTC
Permalink
>
> I don't know exist that well but typically once a document is fetched into memory in a XML DB it can stay cached,
> But if you are loading too many docs the cache will get full and it will have to reload the docs.
>
> That is assuming that the size of your documents is bigger than the transforms.

Just for clarity. In eXist we do not load entire documents into
memory, rather we load proxies to only those nodes that you address in
your query. The nodes themselves are only really read during
serialization, and that should be done in a streaming manner, so
memory should not be an issue.

However, even then it is still true in eXist that iterating over the
documents instead of the transforms, just as you suggested, would be
more efficient.

--
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk
Ihe Onwuka
2014-01-14 00:19:21 UTC
Permalink
On Mon, Jan 13, 2014 at 11:54 PM, David Lee <***@calldei.com> wrote:
> If your running in exist then pure XQuery is probably as good or better then anything else.
> Could you expand on your problem ?
>

I have a collection. Suppose I have 10 different outputs I want and
the decision as to which output a node is routed to (not necessarily
mutual exclusive though) can be encapsulated in a predicate (hence
amenable to a filter HOF).

I only want to iterate over the collection once but I wish to apply
the 10 predicates to each node in the sequence so as to determine
which of the 10 outputs (again not necessarily mutually exclusive) the
node will feature in.

The algorithm is straightforward using structural recursion (or tail
recursion if tail call elimination is an issue). Asking whether there
is an idiomatic XQuery solution that is different from this.
David Lee
2014-01-14 00:38:54 UTC
Permalink
Could you define what you mean by "output" ?

----------------------------------------
David A. Lee
***@calldei.com
http://www.xmlsh.org


-----Original Message-----
From: Ihe Onwuka [mailto:***@gmail.com]
Sent: Monday, January 13, 2014 4:19 PM
To: David Lee
Cc: ***@x-query.com
Subject: Re: [xquery-talk] Multiple output via a stream of filters

On Mon, Jan 13, 2014 at 11:54 PM, David Lee <***@calldei.com> wrote:
> If your running in exist then pure XQuery is probably as good or better then anything else.
> Could you expand on your problem ?
>

I have a collection. Suppose I have 10 different outputs I want and the decision as to which output a node is routed to (not necessarily mutual exclusive though) can be encapsulated in a predicate (hence amenable to a filter HOF).

I only want to iterate over the collection once but I wish to apply the 10 predicates to each node in the sequence so as to determine which of the 10 outputs (again not necessarily mutually exclusive) the node will feature in.

The algorithm is straightforward using structural recursion (or tail recursion if tail call elimination is an issue). Asking whether there is an idiomatic XQuery solution that is different from this.
Ihe Onwuka
2014-01-14 00:41:13 UTC
Permalink
take the simplest case..... a straight copy of the node provided the
predicate passes.

On Tue, Jan 14, 2014 at 12:38 AM, David Lee <***@calldei.com> wrote:
> Could you define what you mean by "output" ?
>
> ----------------------------------------
> David A. Lee
> ***@calldei.com
> http://www.xmlsh.org
>
>
> -----Original Message-----
> From: Ihe Onwuka [mailto:***@gmail.com]
> Sent: Monday, January 13, 2014 4:19 PM
> To: David Lee
> Cc: ***@x-query.com
> Subject: Re: [xquery-talk] Multiple output via a stream of filters
>
> On Mon, Jan 13, 2014 at 11:54 PM, David Lee <***@calldei.com> wrote:
>> If your running in exist then pure XQuery is probably as good or better then anything else.
>> Could you expand on your problem ?
>>
>
> I have a collection. Suppose I have 10 different outputs I want and the decision as to which output a node is routed to (not necessarily mutual exclusive though) can be encapsulated in a predicate (hence amenable to a filter HOF).
>
> I only want to iterate over the collection once but I wish to apply the 10 predicates to each node in the sequence so as to determine which of the 10 outputs (again not necessarily mutually exclusive) the node will feature in.
>
> The algorithm is straightforward using structural recursion (or tail recursion if tail call elimination is an issue). Asking whether there is an idiomatic XQuery solution that is different from this.
Ihe Onwuka
2014-01-14 00:55:59 UTC
Permalink
If it helps think of an entity that entails a bazillion fields like a
credit card application and you have multiple ways you wish to
classify it. So you have a filter for each classification scheme and
the application "filters" through to the output stream of whichever
predicates return true.



On Tue, Jan 14, 2014 at 12:38 AM, David Lee <***@calldei.com> wrote:
> Could you define what you mean by "output" ?
>
> ----------------------------------------
> David A. Lee
> ***@calldei.com
> http://www.xmlsh.org
>
>
> -----Original Message-----
> From: Ihe Onwuka [mailto:***@gmail.com]
> Sent: Monday, January 13, 2014 4:19 PM
> To: David Lee
> Cc: ***@x-query.com
> Subject: Re: [xquery-talk] Multiple output via a stream of filters
>
> On Mon, Jan 13, 2014 at 11:54 PM, David Lee <***@calldei.com> wrote:
>> If your running in exist then pure XQuery is probably as good or better then anything else.
>> Could you expand on your problem ?
>>
>
> I have a collection. Suppose I have 10 different outputs I want and the decision as to which output a node is routed to (not necessarily mutual exclusive though) can be encapsulated in a predicate (hence amenable to a filter HOF).
>
> I only want to iterate over the collection once but I wish to apply the 10 predicates to each node in the sequence so as to determine which of the 10 outputs (again not necessarily mutually exclusive) the node will feature in.
>
> The algorithm is straightforward using structural recursion (or tail recursion if tail call elimination is an issue). Asking whether there is an idiomatic XQuery solution that is different from this.
David Lee
2014-01-14 03:42:09 UTC
Permalink
A Single XQuery invocation has only one "output stream" ... but when your working in a database often you don't need any, you want to put the data back into the database.
Functions don't have "streams" really either although the word is overloaded.
One way it to produce a "sequence" of results which you then do something with, either output in the main output or store into the database (as separate documents).

Also I believe eXist has functions for writing to files so your question my have more relevance on the exists mailing list.

But if what I think you mean by "stream" is a long result ... and your dealing with GB+ data you need to be very careful,
True "streaming" is very hard to achieve in XQuery. Functions don't "stream" they return values.
XPath doesn't "stream" ... there is no real "streams" in XQuery ... so if your not careful you end up with having to hold the entire result set in memory.
But if your output is to the 'main' output (is this an HTTP call ?) you might be able to stream to it .. all depends on the vendor implementation.

I suspect what your really after is to write multiple results to either documents or files.
XQuery doesn't support this generically but XML Databases like eXist do so you might want to ask there.

Or I may have completely misunderstood your question.



----------------------------------------
David A. Lee
***@calldei.com
http://www.xmlsh.org

-----Original Message-----
From: Ihe Onwuka [mailto:***@gmail.com]
Sent: Monday, January 13, 2014 4:56 PM
To: David Lee
Cc: ***@x-query.com
Subject: Re: [xquery-talk] Multiple output via a stream of filters

If it helps think of an entity that entails a bazillion fields like a credit card application and you have multiple ways you wish to classify it. So you have a filter for each classification scheme and the application "filters" through to the output stream of whichever predicates return true.



On Tue, Jan 14, 2014 at 12:38 AM, David Lee <***@calldei.com> wrote:
> Could you define what you mean by "output" ?
>
> ----------------------------------------
> David A. Lee
> ***@calldei.com
> http://www.xmlsh.org
>
>
> -----Original Message-----
> From: Ihe Onwuka [mailto:***@gmail.com]
> Sent: Monday, January 13, 2014 4:19 PM
> To: David Lee
> Cc: ***@x-query.com
> Subject: Re: [xquery-talk] Multiple output via a stream of filters
>
> On Mon, Jan 13, 2014 at 11:54 PM, David Lee <***@calldei.com> wrote:
>> If your running in exist then pure XQuery is probably as good or better then anything else.
>> Could you expand on your problem ?
>>
>
> I have a collection. Suppose I have 10 different outputs I want and the decision as to which output a node is routed to (not necessarily mutual exclusive though) can be encapsulated in a predicate (hence amenable to a filter HOF).
>
> I only want to iterate over the collection once but I wish to apply the 10 predicates to each node in the sequence so as to determine which of the 10 outputs (again not necessarily mutually exclusive) the node will feature in.
>
> The algorithm is straightforward using structural recursion (or tail recursion if tail call elimination is an issue). Asking whether there is an idiomatic XQuery solution that is different from this.
Ihe Onwuka
2014-01-14 03:50:19 UTC
Permalink
Input .... ginormous xml document containing data relating to a credit
card application that I only want to read once.

Filters. - under 25's, college educated, single mother, bankrupts, 6
figure income, repeat applicants, self-employed.

1 input - 7 outputs

All applicants that meet the criteria of a given filter are copied to
the output corresponding to that filter.

We could camouflage the 7 outputs into one by wrapping an element
around the whole 7.

On Tue, Jan 14, 2014 at 3:42 AM, David Lee <***@calldei.com> wrote:
> A Single XQuery invocation has only one "output stream" ... but when your working in a database often you don't need any, you want to put the data back into the database.
> Functions don't have "streams" really either although the word is overloaded.
> One way it to produce a "sequence" of results which you then do something with, either output in the main output or store into the database (as separate documents).
>
> Also I believe eXist has functions for writing to files so your question my have more relevance on the exists mailing list.
>
> But if what I think you mean by "stream" is a long result ... and your dealing with GB+ data you need to be very careful,
> True "streaming" is very hard to achieve in XQuery. Functions don't "stream" they return values.
> XPath doesn't "stream" ... there is no real "streams" in XQuery ... so if your not careful you end up with having to hold the entire result set in memory.
> But if your output is to the 'main' output (is this an HTTP call ?) you might be able to stream to it .. all depends on the vendor implementation.
>
> I suspect what your really after is to write multiple results to either documents or files.
> XQuery doesn't support this generically but XML Databases like eXist do so you might want to ask there.
>
> Or I may have completely misunderstood your question.
>
>
>
> ----------------------------------------
> David A. Lee
> ***@calldei.com
> http://www.xmlsh.org
>
> -----Original Message-----
> From: Ihe Onwuka [mailto:***@gmail.com]
> Sent: Monday, January 13, 2014 4:56 PM
> To: David Lee
> Cc: ***@x-query.com
> Subject: Re: [xquery-talk] Multiple output via a stream of filters
>
> If it helps think of an entity that entails a bazillion fields like a credit card application and you have multiple ways you wish to classify it. So you have a filter for each classification scheme and the application "filters" through to the output stream of whichever predicates return true.
>
>
>
> On Tue, Jan 14, 2014 at 12:38 AM, David Lee <***@calldei.com> wrote:
>> Could you define what you mean by "output" ?
>>
>> ----------------------------------------
>> David A. Lee
>> ***@calldei.com
>> http://www.xmlsh.org
>>
>>
>> -----Original Message-----
>> From: Ihe Onwuka [mailto:***@gmail.com]
>> Sent: Monday, January 13, 2014 4:19 PM
>> To: David Lee
>> Cc: ***@x-query.com
>> Subject: Re: [xquery-talk] Multiple output via a stream of filters
>>
>> On Mon, Jan 13, 2014 at 11:54 PM, David Lee <***@calldei.com> wrote:
>>> If your running in exist then pure XQuery is probably as good or better then anything else.
>>> Could you expand on your problem ?
>>>
>>
>> I have a collection. Suppose I have 10 different outputs I want and the decision as to which output a node is routed to (not necessarily mutual exclusive though) can be encapsulated in a predicate (hence amenable to a filter HOF).
>>
>> I only want to iterate over the collection once but I wish to apply the 10 predicates to each node in the sequence so as to determine which of the 10 outputs (again not necessarily mutually exclusive) the node will feature in.
>>
>> The algorithm is straightforward using structural recursion (or tail recursion if tail call elimination is an issue). Asking whether there is an idiomatic XQuery solution that is different from this.
Ihe Onwuka
2014-01-14 03:51:43 UTC
Permalink
I only read the ginormous XML once... I apply the 7 filters to each
node read and it gets allocated to one of the 7 output buckets (hows
that for a semantically neutral term).

On Tue, Jan 14, 2014 at 3:50 AM, Ihe Onwuka <***@gmail.com> wrote:
> Input .... ginormous xml document containing data relating to a credit
> card application that I only want to read once.
>
> Filters. - under 25's, college educated, single mother, bankrupts, 6
> figure income, repeat applicants, self-employed.
>
> 1 input - 7 outputs
>
> All applicants that meet the criteria of a given filter are copied to
> the output corresponding to that filter.
>
> We could camouflage the 7 outputs into one by wrapping an element
> around the whole 7.
>
> On Tue, Jan 14, 2014 at 3:42 AM, David Lee <***@calldei.com> wrote:
>> A Single XQuery invocation has only one "output stream" ... but when your working in a database often you don't need any, you want to put the data back into the database.
>> Functions don't have "streams" really either although the word is overloaded.
>> One way it to produce a "sequence" of results which you then do something with, either output in the main output or store into the database (as separate documents).
>>
>> Also I believe eXist has functions for writing to files so your question my have more relevance on the exists mailing list.
>>
>> But if what I think you mean by "stream" is a long result ... and your dealing with GB+ data you need to be very careful,
>> True "streaming" is very hard to achieve in XQuery. Functions don't "stream" they return values.
>> XPath doesn't "stream" ... there is no real "streams" in XQuery ... so if your not careful you end up with having to hold the entire result set in memory.
>> But if your output is to the 'main' output (is this an HTTP call ?) you might be able to stream to it .. all depends on the vendor implementation.
>>
>> I suspect what your really after is to write multiple results to either documents or files.
>> XQuery doesn't support this generically but XML Databases like eXist do so you might want to ask there.
>>
>> Or I may have completely misunderstood your question.
>>
>>
>>
>> ----------------------------------------
>> David A. Lee
>> ***@calldei.com
>> http://www.xmlsh.org
>>
>> -----Original Message-----
>> From: Ihe Onwuka [mailto:***@gmail.com]
>> Sent: Monday, January 13, 2014 4:56 PM
>> To: David Lee
>> Cc: ***@x-query.com
>> Subject: Re: [xquery-talk] Multiple output via a stream of filters
>>
>> If it helps think of an entity that entails a bazillion fields like a credit card application and you have multiple ways you wish to classify it. So you have a filter for each classification scheme and the application "filters" through to the output stream of whichever predicates return true.
>>
>>
>>
>> On Tue, Jan 14, 2014 at 12:38 AM, David Lee <***@calldei.com> wrote:
>>> Could you define what you mean by "output" ?
>>>
>>> ----------------------------------------
>>> David A. Lee
>>> ***@calldei.com
>>> http://www.xmlsh.org
>>>
>>>
>>> -----Original Message-----
>>> From: Ihe Onwuka [mailto:***@gmail.com]
>>> Sent: Monday, January 13, 2014 4:19 PM
>>> To: David Lee
>>> Cc: ***@x-query.com
>>> Subject: Re: [xquery-talk] Multiple output via a stream of filters
>>>
>>> On Mon, Jan 13, 2014 at 11:54 PM, David Lee <***@calldei.com> wrote:
>>>> If your running in exist then pure XQuery is probably as good or better then anything else.
>>>> Could you expand on your problem ?
>>>>
>>>
>>> I have a collection. Suppose I have 10 different outputs I want and the decision as to which output a node is routed to (not necessarily mutual exclusive though) can be encapsulated in a predicate (hence amenable to a filter HOF).
>>>
>>> I only want to iterate over the collection once but I wish to apply the 10 predicates to each node in the sequence so as to determine which of the 10 outputs (again not necessarily mutually exclusive) the node will feature in.
>>>
>>> The algorithm is straightforward using structural recursion (or tail recursion if tail call elimination is an issue). Asking whether there is an idiomatic XQuery solution that is different from this.
David Lee
2014-01-14 03:55:50 UTC
Permalink
In pure generic XQuery there is one output;
So if you want 7 outputs you need to either produce a "sequence" of 7 items or a single document encapsulating them.

To produce a sequence you don't actually need to collect the values just let them evaluate

let $doc = doc("huge.xml")
return ( filter1($doc) , filter2($doc) .... )

TO produce a wrapped element is similar


let $doc = doc("huge.xml")
return <allofthem><filter1>{ filter1($doc) }</fillter1><filter2>{ filter2($doc) } </filter2> ..... </allofthem>


----------------------------------------
David A. Lee
***@calldei.com
http://www.xmlsh.org


-----Original Message-----
From: Ihe Onwuka [mailto:***@gmail.com]
Sent: Monday, January 13, 2014 7:52 PM
To: David Lee
Cc: ***@x-query.com
Subject: Re: [xquery-talk] Multiple output via a stream of filters

I only read the ginormous XML once... I apply the 7 filters to each node read and it gets allocated to one of the 7 output buckets (hows that for a semantically neutral term).

On Tue, Jan 14, 2014 at 3:50 AM, Ihe Onwuka <***@gmail.com> wrote:
> Input .... ginormous xml document containing data relating to a credit
> card application that I only want to read once.
>
> Filters. - under 25's, college educated, single mother, bankrupts, 6
> figure income, repeat applicants, self-employed.
>
> 1 input - 7 outputs
>
> All applicants that meet the criteria of a given filter are copied to
> the output corresponding to that filter.
>
> We could camouflage the 7 outputs into one by wrapping an element
> around the whole 7.
>
> On Tue, Jan 14, 2014 at 3:42 AM, David Lee <***@calldei.com> wrote:
>> A Single XQuery invocation has only one "output stream" ... but when your working in a database often you don't need any, you want to put the data back into the database.
>> Functions don't have "streams" really either although the word is overloaded.
>> One way it to produce a "sequence" of results which you then do something with, either output in the main output or store into the database (as separate documents).
>>
>> Also I believe eXist has functions for writing to files so your question my have more relevance on the exists mailing list.
>>
>> But if what I think you mean by "stream" is a long result ... and
>> your dealing with GB+ data you need to be very careful, True "streaming" is very hard to achieve in XQuery. Functions don't "stream" they return values.
>> XPath doesn't "stream" ... there is no real "streams" in XQuery ... so if your not careful you end up with having to hold the entire result set in memory.
>> But if your output is to the 'main' output (is this an HTTP call ?) you might be able to stream to it .. all depends on the vendor implementation.
>>
>> I suspect what your really after is to write multiple results to either documents or files.
>> XQuery doesn't support this generically but XML Databases like eXist do so you might want to ask there.
>>
>> Or I may have completely misunderstood your question.
>>
>>
>>
>> ----------------------------------------
>> David A. Lee
>> ***@calldei.com
>> http://www.xmlsh.org
>>
>> -----Original Message-----
>> From: Ihe Onwuka [mailto:***@gmail.com]
>> Sent: Monday, January 13, 2014 4:56 PM
>> To: David Lee
>> Cc: ***@x-query.com
>> Subject: Re: [xquery-talk] Multiple output via a stream of filters
>>
>> If it helps think of an entity that entails a bazillion fields like a credit card application and you have multiple ways you wish to classify it. So you have a filter for each classification scheme and the application "filters" through to the output stream of whichever predicates return true.
>>
>>
>>
>> On Tue, Jan 14, 2014 at 12:38 AM, David Lee <***@calldei.com> wrote:
>>> Could you define what you mean by "output" ?
>>>
>>> ----------------------------------------
>>> David A. Lee
>>> ***@calldei.com
>>> http://www.xmlsh.org
>>>
>>>
>>> -----Original Message-----
>>> From: Ihe Onwuka [mailto:***@gmail.com]
>>> Sent: Monday, January 13, 2014 4:19 PM
>>> To: David Lee
>>> Cc: ***@x-query.com
>>> Subject: Re: [xquery-talk] Multiple output via a stream of filters
>>>
>>> On Mon, Jan 13, 2014 at 11:54 PM, David Lee <***@calldei.com> wrote:
>>>> If your running in exist then pure XQuery is probably as good or better then anything else.
>>>> Could you expand on your problem ?
>>>>
>>>
>>> I have a collection. Suppose I have 10 different outputs I want and the decision as to which output a node is routed to (not necessarily mutual exclusive though) can be encapsulated in a predicate (hence amenable to a filter HOF).
>>>
>>> I only want to iterate over the collection once but I wish to apply the 10 predicates to each node in the sequence so as to determine which of the 10 outputs (again not necessarily mutually exclusive) the node will feature in.
>>>
>>> The algorithm is straightforward using structural recursion (or tail recursion if tail call elimination is an issue). Asking whether there is an idiomatic XQuery solution that is different from this.
Michael Kay
2014-01-14 08:42:51 UTC
Permalink
> I only read the ginormous XML once... I apply the 7 filters to each
> node read and it gets allocated to one of the 7 output buckets (hows
> that for a semantically neutral term).
>

This is known within the XSL WG as the "coloured widgets" problem after a streaming use case put forward by Oliver Becker. (The problem is, given an input document containing widgets of different colours, produce N output documents, one for each colour present in the file. There are two variants of the problem, one where the set of colours is known statically, one where it is dynamic). The XSLT 3.0 streaming solution for the static case is:

<xsl:stream href="widgets.xml">
<xsl:fork>
<xsl:sequence>
<xsl:result-document href="red.xml">
<xsl:sequence select="*/widget[@colour='red']"/>
</xsl:result-document>
</xsl:sequence>
<xsl:sequence>
<xsl:result-document href="blue.xml">
<xsl:sequence select="*/widget[@colour='blue']"/>
</xsl:result-document>
</xsl:sequence>
<xsl:sequence>
<xsl:result-document href="green.xml">
<xsl:sequence select="*/widget[@colour='green']"/>
</xsl:result-document>
</xsl:sequence>
</xsl:fork>
</xsl:stream>

A streaming processor is required to evaluate this in a single pass of the input document; the three "prongs" of the xsl:fork are effectively executed in parallel.

I mention this purely for academic interest, since there is no implementation available, unless you count the one I wrote last week.

I don't think XSLT 3.0 currently has an equivalent solution for the dynamic case, where the colours are not known in advance. The normal solution would use "group-by" but this is not streamable.

Michael Kay
Saxonica
William Candillon
2014-01-14 08:56:03 UTC
Permalink
Hi Ihe,

Here are a couple of resources about streaming xml and json documents
with Zorba:
http://www.zorba.io/blog/60027975996/xml-streaming-with-xquery
http://www.zorba.io/blog/60027825488/json-streaming-with-xquery

As I'm reading your use case, I have a feeling that the transform.xq
library from John Snelson at https://github.com/jpcs/transform.xq
might hopefully be implementing the pattern you are looking for.

Kind regards,

William

On Tue, Jan 14, 2014 at 9:42 AM, Michael Kay <***@saxonica.com> wrote:
>
>
>> I only read the ginormous XML once... I apply the 7 filters to each
>> node read and it gets allocated to one of the 7 output buckets (hows
>> that for a semantically neutral term).
>>
>
> This is known within the XSL WG as the "coloured widgets" problem after a streaming use case put forward by Oliver Becker. (The problem is, given an input document containing widgets of different colours, produce N output documents, one for each colour present in the file. There are two variants of the problem, one where the set of colours is known statically, one where it is dynamic). The XSLT 3.0 streaming solution for the static case is:
>
> <xsl:stream href="widgets.xml">
> <xsl:fork>
> <xsl:sequence>
> <xsl:result-document href="red.xml">
> <xsl:sequence select="*/widget[@colour='red']"/>
> </xsl:result-document>
> </xsl:sequence>
> <xsl:sequence>
> <xsl:result-document href="blue.xml">
> <xsl:sequence select="*/widget[@colour='blue']"/>
> </xsl:result-document>
> </xsl:sequence>
> <xsl:sequence>
> <xsl:result-document href="green.xml">
> <xsl:sequence select="*/widget[@colour='green']"/>
> </xsl:result-document>
> </xsl:sequence>
> </xsl:fork>
> </xsl:stream>
>
> A streaming processor is required to evaluate this in a single pass of the input document; the three "prongs" of the xsl:fork are effectively executed in parallel.
>
> I mention this purely for academic interest, since there is no implementation available, unless you count the one I wrote last week.
>
> I don't think XSLT 3.0 currently has an equivalent solution for the dynamic case, where the colours are not known in advance. The normal solution would use "group-by" but this is not streamable.
>
> Michael Kay
> Saxonica
>
>
> _______________________________________________
> ***@x-query.com
> http://x-query.com/mailman/listinfo/talk
Continue reading on narkive:
Loading...