[xquery-talk] From map entry pairs to a pair of arrays

Discussion:

Joe Wicentowski

2017-07-15 15:41:22 UTC

Hi all,

Is there an efficient way to transform entries in a map into two arrays,
one containing the keys and one containing the values? Here's my attempt:

```
xquery version "3.1";

let $letters-numbers := map { "a": 1, "b": 2, "c": 3 }
return
(
array { map:keys($letters-numbers) },
array { $letters-numbers?* }
)
```

This successfully takes `map { "a": 1, "b": 2, "c": 3 }` and returns (["a",
"b", "c"], [1, 2, 3]), but the way it iterates through the map twice
strikes me as somewhat inefficient. Is there a better way to reduce this
to a single pass through the map, or is this actually the best approach?

Joe

Emmanuel Chateau

2017-07-15 16:12:39 UTC

Permalink

Hi,
I would say

```
let $letters-numbers := map { "a": 1, "b": 2, "c": 3 }
let $keys := map:keys($letters-numbers)
return (
array { $keys },
array { for $key in $keys return map:get($letters-numbers, $key) }
)
```

Post by Joe Wicentowski
Hi all,
```
xquery version "3.1";
let $letters-numbers := map { "a": 1, "b": 2, "c": 3 }
return
(
array { map:keys($letters-numbers) },
array { $letters-numbers?* }
)
```
This successfully takes `map { "a": 1, "b": 2, "c": 3 }` and returns (["a", "b", "c"], [1, 2, 3]), but the way it iterates through the map twice strikes me as somewhat inefficient. Is there a better way to reduce this to a single pass through the map, or is this actually the best approach?
Joe
_______________________________________________
http://x-query.com/mailman/listinfo/talk

_______________________________________________

Joe Wicentowski

2017-07-15 18:16:19 UTC

Permalink

Great, thank you, Emmanuel! That is much better.

I might just offer one enhancement - switching from a FLWOR to the simple
map operator:

```
xquery version "3.1";

let $letters-numbers := map { "a": 1, "b": 2, "c": 3 }
let $keys := map:keys($letters-numbers)
return (
array { $keys },
array { $keys ! map:get($letters-numbers, .) }
)
```

Joe

Post by Emmanuel Chateau
Hi,
I would say
```
let $letters-numbers := map { "a": 1, "b": 2, "c": 3 }
let $keys := map:keys($letters-numbers)
return (
array { $keys },
array { for $key in $keys return map:get($letters-numbers, $key) }
)
```

(["a", "b", "c"], [1, 2, 3]), but the way it iterates through the map twice
strikes me as somewhat inefficient. Is there a better way to reduce this
to a single pass through the map, or is this actually the best approach?

Post by Joe Wicentowski
Joe
_______________________________________________
http://x-query.com/mailman/listinfo/talk

Christian Grün

2017-07-15 19:08:46 UTC

Permalink

Hi Joe, hi Emmanuel,

Here is yet another alternative that can be efficient (but it’s not
the most compact one):

let $map := map { "a": 1, "b": 2, "c": 3 }
return (
array { map:keys($map) },
array { map:for-each($map, function($key, $value) { $value }) }
)

Last but not least, map:get($map, ...) can be replaced with $map(...):

let $map := map { "a": 1, "b": 2, "c": 3 }
let $keys := map:keys($map)
return (
array { $keys },
array { $keys ! $map(.) }
)

Depending on the XQuery processor, and even the specific version, one
or the other solution may be most efficient.

Cheers,
Christian

Post by Joe Wicentowski
Great, thank you, Emmanuel! That is much better.
I might just offer one enhancement - switching from a FLWOR to the simple
```
xquery version "3.1";
let $letters-numbers := map { "a": 1, "b": 2, "c": 3 }
let $keys := map:keys($letters-numbers)
return (
array { $keys },
array { $keys ! map:get($letters-numbers, .) }
)
```
Joe

Post by Joe Wicentowski
Hi all,
Is there an efficient way to transform entries in a map into two arrays,
```
xquery version "3.1";
let $letters-numbers := map { "a": 1, "b": 2, "c": 3 }
return
(
array { map:keys($letters-numbers) },
array { $letters-numbers?* }
)
```
This successfully takes `map { "a": 1, "b": 2, "c": 3 }` and returns
(["a", "b", "c"], [1, 2, 3]), but the way it iterates through the map twice
strikes me as somewhat inefficient. Is there a better way to reduce this to
a single pass through the map, or is this actually the best approach?
Joe
_______________________________________________
http://x-query.com/mailman/listinfo/talk

_______________________________________________
http://x-query.com/mailman/listinfo/talk

_______________________________________________
***@x-query.com
htt

W.S. Hager

2017-07-15 20:29:10 UTC

Permalink

Hi Joe,

In addition to other comments: also the size of the data matters, and not
to mention the purpose of what you're trying to achieve... Small chunks are
usually processed faster, so unless you just need to convert all data,
perhaps limiting the set could be useful. Especially when you need yet
another result, and the array would just be an intermediary result,
performance may vary across implementations.

Obviously this operation could be implemented in the host language to make
it truly efficient.

Cheers,
Wouter

Post by Christian GrÃ¼n
Hi Joe, hi Emmanuel,
Here is yet another alternative that can be efficient (but itâs not
let $map := map { "a": 1, "b": 2, "c": 3 }
return (
array { map:keys($map) },
array { map:for-each($map, function($key, $value) { $value }) }
)
let $map := map { "a": 1, "b": 2, "c": 3 }
let $keys := map:keys($map)
return (
array { $keys },
array { $keys ! $map(.) }
)
Depending on the XQuery processor, and even the specific version, one
or the other solution may be most efficient.
Cheers,
Christian

Post by Joe Wicentowski
Hi all,
Is there an efficient way to transform entries in a map into two

arrays,

Post by Joe Wicentowski

Post by Emmanuel Chateau

Post by Joe Wicentowski
one containing the keys and one containing the values? Here's my
```
xquery version "3.1";
let $letters-numbers := map { "a": 1, "b": 2, "c": 3 }
return
(
array { map:keys($letters-numbers) },
array { $letters-numbers?* }
)
```
This successfully takes `map { "a": 1, "b": 2, "c": 3 }` and returns
(["a", "b", "c"], [1, 2, 3]), but the way it iterates through the map

twice

Post by Joe Wicentowski

Post by Emmanuel Chateau

Post by Joe Wicentowski
strikes me as somewhat inefficient. Is there a better way to reduce

this to

Post by Joe Wicentowski

Post by Emmanuel Chateau

Post by Joe Wicentowski
a single pass through the map, or is this actually the best approach?
Joe
_______________________________________________
http://x-query.com/mailman/listinfo/talk

_______________________________________________
http://x-query.com/mailman/listinfo/talk

Michael Kay

2017-07-15 20:47:10 UTC

Permalink

Obviously this operation could be implemented in the host language to make it truly efficient.

Don't assume that other languages are inevitably more efficient than XQuery.

And don't assume that the cost of moving/converting data from one programming language to another is negligible.

Michael Kay
Saxonica

_______________________________________________
***@x-query.com
http://x-query.com/mailman/listinfo/talk

W.S. Hager

2017-07-16 10:08:49 UTC

Permalink

Fact of the matter is that if a low level optimization can be made, it must
be made outside of XQuery. I assume it happens all the time. But sure, it
depends on the use case.

Post by W.S. Hager
Obviously this operation could be implemented in the host language to

make it truly efficient.

Don't assume that other languages are inevitably more efficient than XQuery.

And don't assume that the cost of moving/converting data from one
programming language to another is negligible.

Michael Kay
Saxonica

Christian Grün

2017-07-16 10:52:34 UTC

Permalink

Hi Wouter,

In my experience, XQuery maps can be surprisingly efficient, even if
millions of items need to be processed. Obviously, no programming
language solves all problems best, though (even today, an assembler
language can a better choice than C).

Did you come across particular use cases in which it turned out that
XQuery maps and arrays were not the best choice? What do you believe
was the critical factor (memory consumption, runtime, …)?

Cheers,
Christian

Post by W.S. Hager
Fact of the matter is that if a low level optimization can be made, it must
be made outside of XQuery. I assume it happens all the time. But sure, it
depends on the use case.

Post by W.S. Hager
Obviously this operation could be implemented in the host language to make
it truly efficient.

_______________________________________________
***@x-query.c

W.S. Hager

2017-07-16 19:51:21 UTC

Permalink

Hi Christian,

The difference between C and Assembly is a lot smaller than between C and
XQ, and I think memory management should not be taken lightly (I'm not
saying you do). In OP's case, I believe keys could be discarded while
memory layout remains intact.

Cheers,
Wouter

Op 16 jul. 2017 12:52 schreef "Christian GrÃŒn" <***@gmail.com>:

Hi Wouter,

In my experience, XQuery maps can be surprisingly efficient, even if
millions of items need to be processed. Obviously, no programming
language solves all problems best, though (even today, an assembler
language can a better choice than C).

Did you come across particular use cases in which it turned out that
XQuery maps and arrays were not the best choice? What do you believe
was the critical factor (memory consumption, runtime, âŠ)?

Cheers,
Christian

Post by W.S. Hager
Fact of the matter is that if a low level optimization can be made, it must
be made outside of XQuery. I assume it happens all the time. But sure, it
depends on the use case.

Post by W.S. Hager
Obviously this operation could be implemented in the host language to make
it truly efficient.

Michael Kay

2017-07-16 21:58:11 UTC

Permalink

ce between C and Assembly is a lot smaller than between C and XQ, and I think memory management should not be taken lightly (I'm not saying you do). In OP's case, I believe keys could be discarded while memory layout remains intact.

As a general rule of thumb, a lower-level language performs better provided that you have the time and skills to write the code efficiently.

And as a general rule of thumb, you don't.

Even when you are quite convinced that you do.

It's a long time since I wrote in anything as low-level as C or assembler, but if you compare XQuery and Java, the level of abstraction of the API for maps is very similar, so there is no intrinsic reason to believe one should perform better than the other. The main difference is that maps in XQuery are immutable, which means you pay a little more for some operations (like adding a new entry), and you pay a lot less for other operations (llike bulk copying).

Michael Kay
Saxonica
_______________________________________________
***@x-query.com
http://x-query.com/mailman/listinfo/talk

Ghislain Fourny

2017-07-17 06:31:17 UTC

Permalink

Hi,

My first (naive) thought reading this thread is that, from a theoretical complexity perspective -- unless I missed something -- the first version in the original message was already asymptotically optimal (linear). The time it takes to compute an output always has, as a lower bound, the time it takes to output it ("print it out"), and the output is made of two "passes of the input". The improvements suggested should change the constant, but asymptotically on larger inputs, I do not expect a significant difference.

Having said this, of course, in practice, the constant may make a big difference.

Just my two cents!

Kind regards,
Ghislain

Post by Michael Kay
ce between C and Assembly is a lot smaller than between C and XQ, and I think memory management should not be taken lightly (I'm not saying you do). In OP's case, I believe keys could be discarded while memory layout remains intact.
As a general rule of thumb, a lower-level language performs better provided that you have the time and skills to write the code efficiently.
And as a general rule of thumb, you don't.
Even when you are quite convinced that you do.
It's a long time since I wrote in anything as low-level as C or assembler, but if you compare XQuery and Java, the level of abstraction of the API for maps is very similar, so there is no intrinsic reason to believe one should perform better than the other. The main difference is that maps in XQuery are immutable, which means you pay a little more for some operations (like adding a new entry), and you pay a lot less for other operations (llike bulk copying).
Michael Kay
Saxonica
_______________________________________________
http://x-query.com/mailman/listinfo/talk

_______________________________________________
***@x-query.com
http://x-query.com/mailman/listinfo/talk

Christian Grün

2017-07-17 09:23:07 UTC

Permalink

Hi Ghislain,

I completely agree with your assessment. I would expect the two writings…

$map?*
map:for-each($map, function($key, $value) { $value })

…to be faster than…

$key ! map:get($map, $key)
$key ! $map($key)

…because the latter ones require one additional lookup per key.
However, from a practical perspective, this is mostly up to the
implementation, which may choose different evaluation strategies for
each of the alternatives. For example, the evaluation of $map?* or
$map($key) could possibly be sped if it can statically be detected
that $map will always be of type map(*).

All the best,
Christian

Post by Ghislain Fourny
Hi,
My first (naive) thought reading this thread is that, from a theoretical complexity perspective -- unless I missed something -- the first version in the original message was already asymptotically optimal (linear). The time it takes to compute an output always has, as a lower bound, the time it takes to output it ("print it out"), and the output is made of two "passes of the input". The improvements suggested should change the constant, but asymptotically on larger inputs, I do not expect a significant difference.
Having said this, of course, in practice, the constant may make a big difference.
Just my two cents!
Kind regards,
Ghislain

_______________________________________________
http://x-query.com/mailman/listinfo/talk

_______________________________________________
***@x-query.c

W.S. Hager

2017-07-17 09:10:01 UTC

Permalink

What I meant is that the cost of immutability can be avoided in some cases.
That's primarily a matter of low level short-circuiting, not so much of
writing something new. I don't know if this can be done in Java, I know the
Clojure compiler does this, and I'm not aware of XQ engines that do
transient operations on persistent collections.

Op 16 jul. 2017 23:58 schreef "Michael Kay" <***@saxonica.com>:

ce between C and Assembly is a lot smaller than between C and XQ, and I
think memory management should not be taken lightly (I'm not saying you
do). In OP's case, I believe keys could be discarded while memory layout
remains intact.

As a general rule of thumb, a lower-level language performs better provided
that you have the time and skills to write the code efficiently.

And as a general rule of thumb, you don't.

Even when you are quite convinced that you do.

It's a long time since I wrote in anything as low-level as C or assembler,
but if you compare XQuery and Java, the level of abstraction of the API for
maps is very similar, so there is no intrinsic reason to believe one should
perform better than the other. The main difference is that maps in XQuery
are immutable, which means you pay a little more for some operations (like
adding a new entry), and you pay a lot less for other operations (llike
bulk copying).

Michael Kay
Saxonica

Christian Grün

2017-07-17 10:49:31 UTC

Permalink

Hi Wouter,

Post by W.S. Hager
What I meant is that the cost of immutability can be avoided in some cases.
That's primarily a matter of low level short-circuiting, not so much of
writing something new. I don't know if this can be done in Java, I know the
Clojure compiler does this, and I'm not aware of XQ engines that do
transient operations on persistent collections.

I see, thanks.

In most cases, I would doubt that it’s the immutability of XQuery maps
that slows down queries. The performance of hash tries (which I assume
are used in most implementations of XQuery) is close to mutable hash
maps. As Michael indicated, if data is repeatedly copied, immutable
maps can outperform classical hash maps, so it can even be a wise
choice to use them deliberately.

I have no idea about transient operations in Clojure. Do you know if
they are they based on a general concept, or are they closely related
to the semantics of Clojure? If they are intransparent for the user,
as it seems, I would assume they are comparable to other internal
optimizations that can be applied on a map (such as e.g. resorting to
intermediary mutable data structures if it is known in advance that a
bulk operation takes place).

It would be interesting to have a closer look at some real queries in
which maps seem to be insufficient, and find out if the bottleneck is
really the immutability of the maps, the usage of maps, or the usage
of XQuery in general.

Christian

Post by W.S. Hager
ce between C and Assembly is a lot smaller than between C and XQ, and I
think memory management should not be taken lightly (I'm not saying you do).
In OP's case, I believe keys could be discarded while memory layout remains
intact.
As a general rule of thumb, a lower-level language performs better provided
that you have the time and skills to write the code efficiently.
And as a general rule of thumb, you don't.
Even when you are quite convinced that you do.
It's a long time since I wrote in anything as low-level as C or assembler,
but if you compare XQuery and Java, the level of abstraction of the API for
maps is very similar, so there is no intrinsic reason to believe one should
perform better than the other. The main difference is that maps in XQuery
are immutable, which means you pay a little more for some operations (like
adding a new entry), and you pay a lot less for other operations (llike bulk
copying).
Michael Kay
Saxonica

_______________________________________________
t

Michael Kay

2017-07-15 20:44:51 UTC

Permalink

Another possibility:

declare namespace map = "http://www.w3.org/2005/xpath-functions/map";
let $map := map { "a": 1, "b": 2, "c": 3 }
let $entries := map:for-each($map, function($k, $v){ [$k, $v] })
return (array{$entries!?1}, array{$entries!?2})

Post by Christian GrÃ¼n
Here is yet another alternative that can be efficient (but it’s not
let $map := map { "a": 1, "b": 2, "c": 3 }
return (
array { map:keys($map) },
array { map:for-each($map, function($key, $value) { $value }) }
)

A disadvantage with this, and some other proposed solutions, is that the spec offers no guarantee that map:keys() and map:for-each() will iterate the entries of the map in the same order.

Note also that map:for-each($map, function($key, $value) { $value }) can be written $map?*

Michael Kay
Saxonica

_______________________________________________
***@x-query.com
http://x-query.com/mailm