Discussion:
[xquery-talk] Matrix Multiplication
Ihe Onwuka
2013-12-31 11:04:43 UTC
Permalink
Has anybody tried this in XQuery or if I am so foolish (not yet but give me
time) would I be the courageous <culturalReference>
early
adopter.
Andrew Welch
2013-12-31 11:31:32 UTC
Permalink
It should be pretty trivial...
Post by Ihe Onwuka
Has anybody tried this in XQuery or if I am so foolish (not yet but give
me time) would I be the courageous <culturalReference>
http://youtu.be/ik8JT2S-kBE early
adopter.
Post by Ihe Onwuka
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Ihe Onwuka
2013-12-31 12:13:46 UTC
Permalink
Assuming a sparse representation it is about 4 lines of SQL. This is known
not least because you can read enough articles and papers that discuss it
and it optimises well. The obvious google search does not reveal any
corresponding XQuery discussion, neither does it appear to have surfaced on
this or the eXist mailing list (allowing for my deficient search skills).
For something so "trivial" I thought that was rather strange. Hence I
thought it would be prudent to ask before naively embarking on a 600k X
40k matrix multiplication.
Post by Andrew Welch
It should be pretty trivial...
Post by Ihe Onwuka
Has anybody tried this in XQuery or if I am so foolish (not yet but give
me time) would I be the courageous <culturalReference>
http://youtu.be/ik8JT2S-kBE early
adopter.
Post by Ihe Onwuka
_______________________________________________
http://x-query.com/mailman/listinfo/talk
jean-marc Mercier
2013-12-31 14:27:36 UTC
Permalink
As far as I understand, you want to write a linear algebra module using
XQUERY ?
If so, I opened a thread some months ago about this idea. My opinion today
is that this is a false good idea at present time.

1) XQUERY would be really good for writing concise, efficient linear
algebra modulus.
2) However, I strongly recommend to wait a little bit for starting coding :
the current version of XQUERY (3.0) suffers from performance issues. A
linear algebra modulus written in XQUERY is expected to have performances
performances 1000 X slower than its corresponding C++ or JAVA (you can
measure it precisely). Any mathematician linear algebra modulus would
probably trashed your modulus after the first test.

Hope this helps
Post by Ihe Onwuka
Assuming a sparse representation it is about 4 lines of SQL. This is known
not least because you can read enough articles and papers that discuss it
and it optimises well. The obvious google search does not reveal any
corresponding XQuery discussion, neither does it appear to have surfaced on
this or the eXist mailing list (allowing for my deficient search skills).
For something so "trivial" I thought that was rather strange. Hence I
thought it would be prudent to ask before naively embarking on a 600k X
40k matrix multiplication.
Post by Andrew Welch
It should be pretty trivial...
Post by Ihe Onwuka
Has anybody tried this in XQuery or if I am so foolish (not yet but
give me time) would I be the courageous <culturalReference>
http://youtu.be/ik8JT2S-kBE early
adopter.
Post by Ihe Onwuka
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Andrew Welch
2013-12-31 14:38:34 UTC
Permalink
Are you saying the spec as it stands somehow forces all implementations to
be 1000x slower, or just what you have observed in some particular
implementation?
Post by jean-marc Mercier
As far as I understand, you want to write a linear algebra module using
XQUERY ?
Post by jean-marc Mercier
If so, I opened a thread some months ago about this idea. My opinion
today is that this is a false good idea at present time.
Post by jean-marc Mercier
1) XQUERY would be really good for writing concise, efficient linear
algebra modulus.
Post by jean-marc Mercier
2) However, I strongly recommend to wait a little bit for starting coding
: the current version of XQUERY (3.0) suffers from performance issues. A
linear algebra modulus written in XQUERY is expected to have performances
performances 1000 X slower than its corresponding C++ or JAVA (you can
measure it precisely). Any mathematician linear algebra modulus would
probably trashed your modulus after the first test.
Post by jean-marc Mercier
Hope this helps
Post by Ihe Onwuka
Assuming a sparse representation it is about 4 lines of SQL. This is
known not least because you can read enough articles and papers that
discuss it and it optimises well. The obvious google search does not reveal
any corresponding XQuery discussion, neither does it appear to have
surfaced on this or the eXist mailing list (allowing for my deficient
search skills). For something so "trivial" I thought that was rather
strange. Hence I thought it would be prudent to ask before naively
embarking on a 600k X 40k matrix multiplication.
Post by jean-marc Mercier
Post by Ihe Onwuka
Post by Andrew Welch
It should be pretty trivial...
Post by Ihe Onwuka
Has anybody tried this in XQuery or if I am so foolish (not yet but
give me time) would I be the courageous <culturalReference>
http://youtu.be/ik8JT2S-kBE early
adopter.
Post by jean-marc Mercier
Post by Ihe Onwuka
Post by Andrew Welch
Post by Ihe Onwuka
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
jean-marc Mercier
2013-12-31 14:55:48 UTC
Permalink
It is not due to the spec. It is rather due to the common usage of XQUERY,
forcing vendor solutions (as BaseX) to focus primarily on XML Data Base
requests more than algorithmic performances.

There are actually some threads that are discussing these performance
issues in the context of maps (maps are for instance used for sparse matrix
representations) : look for instance to ""map module for XQUERY ?" that I
initiated or "Higher-order XQuery Modules" by Leo from BaseX, on talk@
x-query.com mailing list.
Anyhow, to write a serious linear algebra modulus, the basic need is to
have a vector containers. Unfortunately, XQUERY does not provide any
performant vector containers at present time, and it is not possible to
code them in pure XQUERY : I have tried, and more experienced xquery
developpers than me have also tried, without success.

We have to wait for the XQUERY version that will give us these containers,
a decision to be taken by the W3C.
Post by Andrew Welch
Are you saying the spec as it stands somehow forces all implementations to
be 1000x slower, or just what you have observed in some particular
implementation?
Post by jean-marc Mercier
As far as I understand, you want to write a linear algebra module using
XQUERY ?
Post by jean-marc Mercier
If so, I opened a thread some months ago about this idea. My opinion
today is that this is a false good idea at present time.
Post by jean-marc Mercier
1) XQUERY would be really good for writing concise, efficient linear
algebra modulus.
Post by jean-marc Mercier
2) However, I strongly recommend to wait a little bit for starting
coding : the current version of XQUERY (3.0) suffers from performance
issues. A linear algebra modulus written in XQUERY is expected to have
performances performances 1000 X slower than its corresponding C++ or JAVA
(you can measure it precisely). Any mathematician linear algebra modulus
would probably trashed your modulus after the first test.
Post by jean-marc Mercier
Hope this helps
Post by Ihe Onwuka
Assuming a sparse representation it is about 4 lines of SQL. This is
known not least because you can read enough articles and papers that
discuss it and it optimises well. The obvious google search does not reveal
any corresponding XQuery discussion, neither does it appear to have
surfaced on this or the eXist mailing list (allowing for my deficient
search skills). For something so "trivial" I thought that was rather
strange. Hence I thought it would be prudent to ask before naively
embarking on a 600k X 40k matrix multiplication.
Post by jean-marc Mercier
Post by Ihe Onwuka
On Tue, Dec 31, 2013 at 11:31 AM, Andrew Welch <
Post by Andrew Welch
It should be pretty trivial...
Post by Ihe Onwuka
Has anybody tried this in XQuery or if I am so foolish (not yet but
give me time) would I be the courageous <culturalReference>
http://youtu.be/ik8JT2S-kBE early
adopter.
Post by jean-marc Mercier
Post by Ihe Onwuka
Post by Andrew Welch
Post by Ihe Onwuka
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Michael Sokolov
2013-12-31 15:48:55 UTC
Permalink
I would love to see some tests of pure XQuery implementations of both
sparse and dense operations. I suspect performance of matrix multiply,
inversion, etc, will be poorer than in C++ or Java, but I would expect
performance comparable to Perl or Python (w/o its numpy extension) -
just a wild guess. I'd also expect it might be easier to get good
sparse performance. I don't know why, just a hunch.

But the more interesting question for me is whether language changes are
really needed to support this. I would have thought that proper
optimization of operations on sequences would be enough? So for
example, an extension module using sequences as matrix datatypes could
possibly optimize performance using a lower-level implementation. Does
anyone see any reason why that wouldn't be possible?

-Mike

PS:
I reviewed the discussion you referred to, jean-marc, but it seems to
have more to do with using functions as map keys, and I don't see any
direct connection to linear algebra?
Post by jean-marc Mercier
It is not due to the spec. It is rather due to the common usage of
XQUERY, forcing vendor solutions (as BaseX) to focus primarily on XML
Data Base requests more than algorithmic performances.
There are actually some threads that are discussing these performance
issues in the context of maps (maps are for instance used for sparse
matrix representations) : look for instance to ""map module for XQUERY
?" that I initiated or "Higher-order XQuery Modules" by Leo from
Anyhow, to write a serious linear algebra modulus, the basic need is
to have a vector containers. Unfortunately, XQUERY does not provide
any performant vector containers at present time, and it is not
possible to code them in pure XQUERY : I have tried, and more
experienced xquery developpers than me have also tried, without success.
We have to wait for the XQUERY version that will give us these
containers, a decision to be taken by the W3C.
Are you saying the spec as it stands somehow forces all
implementations to be 1000x slower, or just what you have observed
in some particular implementation?
On 31 Dec 2013 14:27, "jean-marc Mercier"
Post by jean-marc Mercier
As far as I understand, you want to write a linear algebra
module using XQUERY ?
Post by jean-marc Mercier
If so, I opened a thread some months ago about this idea. My
opinion today is that this is a false good idea at present time.
Post by jean-marc Mercier
1) XQUERY would be really good for writing concise, efficient
linear algebra modulus.
Post by jean-marc Mercier
2) However, I strongly recommend to wait a little bit for
starting coding : the current version of XQUERY (3.0) suffers from
performance issues. A linear algebra modulus written in XQUERY is
expected to have performances performances 1000 X slower than its
corresponding C++ or JAVA (you can measure it precisely). Any
mathematician linear algebra modulus would probably trashed your
modulus after the first test.
Post by jean-marc Mercier
Hope this helps
Post by Ihe Onwuka
Assuming a sparse representation it is about 4 lines of SQL.
This is known not least because you can read enough articles and
papers that discuss it and it optimises well. The obvious google
search does not reveal any corresponding XQuery discussion,
neither does it appear to have surfaced on this or the eXist
mailing list (allowing for my deficient search skills). For
something so "trivial" I thought that was rather strange. Hence I
thought it would be prudent to ask before naively embarking on a
600k X 40k matrix multiplication.
Post by jean-marc Mercier
Post by Ihe Onwuka
On Tue, Dec 31, 2013 at 11:31 AM, Andrew Welch
Post by Andrew Welch
It should be pretty trivial...
Post by Ihe Onwuka
Has anybody tried this in XQuery or if I am so foolish (not
yet but give me time) would I be the courageous
http://youtu.be/ik8JT2S-kBE
early adopter.
Post by jean-marc Mercier
Post by Ihe Onwuka
Post by Andrew Welch
Post by Ihe Onwuka
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Ryan Dew
2013-12-31 16:36:23 UTC
Permalink
Hopefully, I not out of bounds for throwing this out there since it isn't
technically pure XQuery, but I know MarkLogic allows for creating a custom
C++ plugin that ties into their MapReduce process on indexed fields (
http://developer.marklogic.com/learn/a-mapreduce-aggregation-function). You
can find talk about MapReduce algorithms for matrix computation out there (
http://www.norstad.org/matrix-multiply/). If you structure your data
properly in the database and setup the right indexes, upon first glance it
looks possible to me. Wasn't sure if this was a "is there a any practical
solution out there where I can use XQuery?" discussion or a strictly "is it
reasonable to achieve in the pure XQuery sense?"

-Ryan Dew


On Tue, Dec 31, 2013 at 8:48 AM, Michael Sokolov <
Post by Michael Sokolov
I would love to see some tests of pure XQuery implementations of both
sparse and dense operations. I suspect performance of matrix multiply,
inversion, etc, will be poorer than in C++ or Java, but I would expect
performance comparable to Perl or Python (w/o its numpy extension) - just a
wild guess. I'd also expect it might be easier to get good sparse
performance. I don't know why, just a hunch.
But the more interesting question for me is whether language changes are
really needed to support this. I would have thought that proper
optimization of operations on sequences would be enough? So for example,
an extension module using sequences as matrix datatypes could possibly
optimize performance using a lower-level implementation. Does anyone see
any reason why that wouldn't be possible?
-Mike
I reviewed the discussion you referred to, jean-marc, but it seems to have
more to do with using functions as map keys, and I don't see any direct
connection to linear algebra?
It is not due to the spec. It is rather due to the common usage of XQUERY,
forcing vendor solutions (as BaseX) to focus primarily on XML Data Base
requests more than algorithmic performances.
There are actually some threads that are discussing these performance
issues in the context of maps (maps are for instance used for sparse matrix
representations) : look for instance to ""map module for XQUERY ?" that I
x-query.com mailing list.
Anyhow, to write a serious linear algebra modulus, the basic need is to
have a vector containers. Unfortunately, XQUERY does not provide any
performant vector containers at present time, and it is not possible to
code them in pure XQUERY : I have tried, and more experienced xquery
developpers than me have also tried, without success.
We have to wait for the XQUERY version that will give us these
containers, a decision to be taken by the W3C.
Post by Andrew Welch
Are you saying the spec as it stands somehow forces all implementations
to be 1000x slower, or just what you have observed in some particular
implementation?
Post by jean-marc Mercier
As far as I understand, you want to write a linear algebra module using
XQUERY ?
Post by jean-marc Mercier
If so, I opened a thread some months ago about this idea. My opinion
today is that this is a false good idea at present time.
Post by jean-marc Mercier
1) XQUERY would be really good for writing concise, efficient linear
algebra modulus.
Post by jean-marc Mercier
2) However, I strongly recommend to wait a little bit for starting
coding : the current version of XQUERY (3.0) suffers from performance
issues. A linear algebra modulus written in XQUERY is expected to have
performances performances 1000 X slower than its corresponding C++ or JAVA
(you can measure it precisely). Any mathematician linear algebra modulus
would probably trashed your modulus after the first test.
Post by jean-marc Mercier
Hope this helps
Post by Ihe Onwuka
Assuming a sparse representation it is about 4 lines of SQL. This is
known not least because you can read enough articles and papers that
discuss it and it optimises well. The obvious google search does not reveal
any corresponding XQuery discussion, neither does it appear to have
surfaced on this or the eXist mailing list (allowing for my deficient
search skills). For something so "trivial" I thought that was rather
strange. Hence I thought it would be prudent to ask before naively
embarking on a 600k X 40k matrix multiplication.
Post by jean-marc Mercier
Post by Ihe Onwuka
On Tue, Dec 31, 2013 at 11:31 AM, Andrew Welch <
Post by Andrew Welch
It should be pretty trivial...
Post by Ihe Onwuka
Has anybody tried this in XQuery or if I am so foolish (not yet but
give me time) would I be the courageous <culturalReference>
http://youtu.be/ik8JT2S-kBE early
adopter.
Post by jean-marc Mercier
Post by Ihe Onwuka
Post by Andrew Welch
Post by Ihe Onwuka
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
David Lee
2013-12-31 16:38:25 UTC
Permalink
There is a core limitation in XQuery 3.0 data types that make 2+D matrixes painful.
That is sequences cannot nest, they automatically flatten.
e.g.
( ( 1 , 2 , 3 ) , (4 , 5 , 6) )
becomes
( 1,2,3,4,5,6)

So there is no natural datatype for matrix (> 1D)

You can implement them with elements or functions but both impose an overhead that is harder for the optimizer to get rid of.
e.g. as elements (arbitrary name chosen)

let $a := <c>
<c>
<c>1</c>
<c>2</c>
<c>3</c>
</c>
<c>
<c>4</c>
<c>5</c>
<c>6</c>
</c>
</c>
return $a/c[2]/c[2]

Or as functions

let $a := ( function () { (1,2,3) } , function () { (4,5,6) } )
return $a[2]()[2]


Both return 5


The syntax is a bit cumbersome but it works ... its also hard on the optimizer

This among other reasons (such as supporting json arrays) is why native array types that can nest are often supported as vendor extensions,
and I belive are being considered by W3C as XQuery native types.













From: talk-***@x-query.com [mailto:talk-***@x-query.com] On Behalf Of Michael Sokolov
Sent: Tuesday, December 31, 2013 10:49 AM
To: jean-marc Mercier; Andrew Welch
Cc: xquery-discuss; ihe onwuka
Subject: Re: [xquery-talk] Matrix Multiplication

I would love to see some tests of pure XQuery implementations of both sparse and dense operations. I suspect performance of matrix multiply, inversion, etc, will be poorer than in C++ or Java, but I would expect performance comparable to Perl or Python (w/o its numpy extension) - just a wild guess. I'd also expect it might be easier to get good sparse performance. I don't know why, just a hunch.

But the more interesting question for me is whether language changes are really needed to support this. I would have thought that proper optimization of operations on sequences would be enough? So for example, an extension module using sequences as matrix datatypes could possibly optimize performance using a lower-level implementation. Does anyone see any reason why that wouldn't be possible?

-Mike

PS:
I reviewed the discussion you referred to, jean-marc, but it seems to have more to do with using functions as map keys, and I don't see any direct connection to linear algebra?

On 12/31/2013 9:55 AM, jean-marc Mercier wrote:
It is not due to the spec. It is rather due to the common usage of XQUERY, forcing vendor solutions (as BaseX) to focus primarily on XML Data Base requests more than algorithmic performances.

There are actually some threads that are discussing these performance issues in the context of maps (maps are for instance used for sparse matrix representations) : look for instance to ""map module for XQUERY ?" that I initiated or "Higher-order XQuery Modules" by Leo from BaseX, on ***@x-query.com<http://x-query.com> mailing list.
Anyhow, to write a serious linear algebra modulus, the basic need is to have a vector containers. Unfortunately, XQUERY does not provide any performant vector containers at present time, and it is not possible to code them in pure XQUERY : I have tried, and more experienced xquery developpers than me have also tried, without success.

We have to wait for the XQUERY version that will give us these containers, a decision to be taken by the W3C.


2013/12/31 Andrew Welch <***@gmail.com<mailto:***@gmail.com>>

Are you saying the spec as it stands somehow forces all implementations to be 1000x slower, or just what you have observed in some particular implementation?
As far as I understand, you want to write a linear algebra module using XQUERY ?
If so, I opened a thread some months ago about this idea. My opinion today is that this is a false good idea at present time.
1) XQUERY would be really good for writing concise, efficient linear algebra modulus.
2) However, I strongly recommend to wait a little bit for starting coding : the current version of XQUERY (3.0) suffers from performance issues. A linear algebra modulus written in XQUERY is expected to have performances performances 1000 X slower than its corresponding C++ or JAVA (you can measure it precisely). Any mathematician linear algebra modulus would probably trashed your modulus after the first test.
Hope this helps
Assuming a sparse representation it is about 4 lines of SQL. This is known not least because you can read enough articles and papers that discuss it and it optimises well. The obvious google search does not reveal any corresponding XQuery discussion, neither does it appear to have surfaced on this or the eXist mailing list (allowing for my deficient search skills). For something so "trivial" I thought that was rather strange. Hence I thought it would be prudent to ask before naively embarking on a 600k X 40k matrix multiplication.
Post by Andrew Welch
It should be pretty trivial...
Has anybody tried this in XQuery or if I am so foolish (not yet but give me time) would I be the courageous http://youtu.be/ik8JT2S-kBE early adopter.
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Ihe Onwuka
2013-12-31 17:40:15 UTC
Permalink
There is no need for the representation of a matrix in anything more than 1
dimension.

A two dimensional m x n matrix can be represented by a 1 dimensional array
of length (m * n) + 1 where the + 1 is used to recorded the size of the m
(or if you choose) n axis.

Suppose your matrix was 10 by n. If you have a representation that stores
the fact that the matrix has 10 rows fact then it follows that the item at
index position 704 has row column coordinates of 71, 4. Alternatively if it
instead you chose to store that it had say 5 columns then index 704 has
coordinates 141, 4.

This extends to n dimensions. A n-dimensional matrix can be represented by
an (m1 * m2 * ......* mn) + n-1 array.

Take the three dimensional case, with the previous example. We will take
the 3rd dimension as page. In our representation the extra numbers
represent m and m*n. Take m as 10, n as 5 so our array has 10 and 500
tagged on to it. Given an index of 3683, divide by m*n to give the page
(8) then and use the remainder to figure out you are on row 19 column 3. So
the x,y, z coordinates of that index are 19,3,8.

You can extend this principle to any number of dimensions.

Having said that I am still going to target SQL
Post by David Lee
There is a core limitation in XQuery 3.0 data types that make 2+D matrixes painful.
That is sequences cannot nest, they automatically flatten.
e.g.
( ( 1 , 2 , 3 ) , (4 , 5 , 6) )
becomes
( 1,2,3,4,5,6)
So there is no natural datatype for matrix (> 1D)
You can implement them with elements or functions but both impose an
overhead that is harder for the optimizer to get rid of.
e.g. as elements (arbitrary name chosen)
let $a := <c>
<c>
<c>1</c>
<c>2</c>
<c>3</c>
</c>
<c>
<c>4</c>
<c>5</c>
<c>6</c>
</c>
</c>
return $a/c[2]/c[2]
Or as functions
let $a := ( function () { (1,2,3) } , function () { (4,5,6) } )
return $a[2]()[2]
Both return 5
The syntax is a bit cumbersome but it works ... its also hard on the optimizer
This among other reasons (such as supporting json arrays) is why native
array types that can nest are often supported as vendor extensions,
and I belive are being considered by W3C as XQuery native types.
Behalf Of *Michael Sokolov
*Sent:* Tuesday, December 31, 2013 10:49 AM
*To:* jean-marc Mercier; Andrew Welch
*Cc:* xquery-discuss; ihe onwuka
*Subject:* Re: [xquery-talk] Matrix Multiplication
I would love to see some tests of pure XQuery implementations of both
sparse and dense operations. I suspect performance of matrix multiply,
inversion, etc, will be poorer than in C++ or Java, but I would expect
performance comparable to Perl or Python (w/o its numpy extension) - just a
wild guess. I'd also expect it might be easier to get good sparse
performance. I don't know why, just a hunch.
But the more interesting question for me is whether language changes are
really needed to support this. I would have thought that proper
optimization of operations on sequences would be enough? So for example,
an extension module using sequences as matrix datatypes could possibly
optimize performance using a lower-level implementation. Does anyone see
any reason why that wouldn't be possible?
-Mike
I reviewed the discussion you referred to, jean-marc, but it seems to have
more to do with using functions as map keys, and I don't see any direct
connection to linear algebra?
It is not due to the spec. It is rather due to the common usage of
XQUERY, forcing vendor solutions (as BaseX) to focus primarily on XML Data
Base requests more than algorithmic performances.
There are actually some threads that are discussing these performance
issues in the context of maps (maps are for instance used for sparse matrix
representations) : look for instance to ""map module for XQUERY ?" that I
x-query.com mailing list.
Anyhow, to write a serious linear algebra modulus, the basic need is to
have a vector containers. Unfortunately, XQUERY does not provide any
performant vector containers at present time, and it is not possible to
code them in pure XQUERY : I have tried, and more experienced xquery
developpers than me have also tried, without success.
We have to wait for the XQUERY version that will give us these containers,
a decision to be taken by the W3C.
Are you saying the spec as it stands somehow forces all implementations to
be 1000x slower, or just what you have observed in some particular
implementation?
Post by jean-marc Mercier
As far as I understand, you want to write a linear algebra module using
XQUERY ?
Post by jean-marc Mercier
If so, I opened a thread some months ago about this idea. My opinion
today is that this is a false good idea at present time.
Post by jean-marc Mercier
1) XQUERY would be really good for writing concise, efficient linear
algebra modulus.
Post by jean-marc Mercier
2) However, I strongly recommend to wait a little bit for starting
coding : the current version of XQUERY (3.0) suffers from performance
issues. A linear algebra modulus written in XQUERY is expected to have
performances performances 1000 X slower than its corresponding C++ or JAVA
(you can measure it precisely). Any mathematician linear algebra modulus
would probably trashed your modulus after the first test.
Post by jean-marc Mercier
Hope this helps
Post by Ihe Onwuka
Assuming a sparse representation it is about 4 lines of SQL. This is
known not least because you can read enough articles and papers that
discuss it and it optimises well. The obvious google search does not reveal
any corresponding XQuery discussion, neither does it appear to have
surfaced on this or the eXist mailing list (allowing for my deficient
search skills). For something so "trivial" I thought that was rather
strange. Hence I thought it would be prudent to ask before naively
embarking on a 600k X 40k matrix multiplication.
Post by jean-marc Mercier
Post by Ihe Onwuka
On Tue, Dec 31, 2013 at 11:31 AM, Andrew Welch <
Post by Andrew Welch
It should be pretty trivial...
Post by Ihe Onwuka
Has anybody tried this in XQuery or if I am so foolish (not yet but
give me time) would I be the courageous <culturalReference>
http://youtu.be/ik8JT2S-kBE early
adopter.
Post by jean-marc Mercier
Post by Ihe Onwuka
Post by Andrew Welch
Post by Ihe Onwuka
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
jean-marc Mercier
2013-12-31 16:44:16 UTC
Permalink
Hi

@Michael Concerning your remark over the discussions I quoted, maps are the
basic block for sparse linear algebra. Without performent "maps of nodes"
(equivalent to std::map in C++) you will not be able to write any
performant linear algebra modulus for sparse matrix.

However, before even thinking to sparse matrix, operations on sequences as
concatenations are the first "show-stop" to write a linear algebra modulus,
since sequences are vectors.
Another one is the lack of constant sized vectors (needed for basic dense
matrix operations).

@Ryan thx for these links, they are very interesting.

Well, I am going to party ! Have a happy new year !
Post by Michael Sokolov
I would love to see some tests of pure XQuery implementations of both
sparse and dense operations. I suspect performance of matrix multiply,
inversion, etc, will be poorer than in C++ or Java, but I would expect
performance comparable to Perl or Python (w/o its numpy extension) - just a
wild guess. I'd also expect it might be easier to get good sparse
performance. I don't know why, just a hunch.
But the more interesting question for me is whether language changes are
really needed to support this. I would have thought that proper
optimization of operations on sequences would be enough? So for example,
an extension module using sequences as matrix datatypes could possibly
optimize performance using a lower-level implementation. Does anyone see
any reason why that wouldn't be possible?
-Mike
I reviewed the discussion you referred to, jean-marc, but it seems to have
more to do with using functions as map keys, and I don't see any direct
connection to linear algebra?
It is not due to the spec. It is rather due to the common usage of XQUERY,
forcing vendor solutions (as BaseX) to focus primarily on XML Data Base
requests more than algorithmic performances.
There are actually some threads that are discussing these performance
issues in the context of maps (maps are for instance used for sparse matrix
representations) : look for instance to ""map module for XQUERY ?" that I
x-query.com mailing list.
Anyhow, to write a serious linear algebra modulus, the basic need is to
have a vector containers. Unfortunately, XQUERY does not provide any
performant vector containers at present time, and it is not possible to
code them in pure XQUERY : I have tried, and more experienced xquery
developpers than me have also tried, without success.
We have to wait for the XQUERY version that will give us these
containers, a decision to be taken by the W3C.
Post by Andrew Welch
Are you saying the spec as it stands somehow forces all implementations
to be 1000x slower, or just what you have observed in some particular
implementation?
Post by jean-marc Mercier
As far as I understand, you want to write a linear algebra module using
XQUERY ?
Post by jean-marc Mercier
If so, I opened a thread some months ago about this idea. My opinion
today is that this is a false good idea at present time.
Post by jean-marc Mercier
1) XQUERY would be really good for writing concise, efficient linear
algebra modulus.
Post by jean-marc Mercier
2) However, I strongly recommend to wait a little bit for starting
coding : the current version of XQUERY (3.0) suffers from performance
issues. A linear algebra modulus written in XQUERY is expected to have
performances performances 1000 X slower than its corresponding C++ or JAVA
(you can measure it precisely). Any mathematician linear algebra modulus
would probably trashed your modulus after the first test.
Post by jean-marc Mercier
Hope this helps
Post by Ihe Onwuka
Assuming a sparse representation it is about 4 lines of SQL. This is
known not least because you can read enough articles and papers that
discuss it and it optimises well. The obvious google search does not reveal
any corresponding XQuery discussion, neither does it appear to have
surfaced on this or the eXist mailing list (allowing for my deficient
search skills). For something so "trivial" I thought that was rather
strange. Hence I thought it would be prudent to ask before naively
embarking on a 600k X 40k matrix multiplication.
Post by jean-marc Mercier
Post by Ihe Onwuka
On Tue, Dec 31, 2013 at 11:31 AM, Andrew Welch <
Post by Andrew Welch
It should be pretty trivial...
Post by Ihe Onwuka
Has anybody tried this in XQuery or if I am so foolish (not yet but
give me time) would I be the courageous <culturalReference>
http://youtu.be/ik8JT2S-kBE early
adopter.
Post by jean-marc Mercier
Post by Ihe Onwuka
Post by Andrew Welch
Post by Ihe Onwuka
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
jean-marc Mercier
2013-12-31 16:49:27 UTC
Permalink
@David, there are tricks to overcome sequence concatenations now. See for
instance definition of a pair by Leo, John Snelson, or me : you can write
pair( ( 1 , 2 , 3 ) , (4 5 , 6) ) to avoid sequence concatenation. I ve
written also constant sized vectors using this trick : for instance
NUPLE(1,(),<toto>,5), withh associated getters.
The bad new is that these operations takes too much time with all the
interpreters I have tried, and can't be used in heavy algorithms.
Post by jean-marc Mercier
Hi
@Michael Concerning your remark over the discussions I quoted, maps are
the basic block for sparse linear algebra. Without performent "maps of
nodes" (equivalent to std::map in C++) you will not be able to write any
performant linear algebra modulus for sparse matrix.
However, before even thinking to sparse matrix, operations on sequences as
concatenations are the first "show-stop" to write a linear algebra modulus,
since sequences are vectors.
Another one is the lack of constant sized vectors (needed for basic dense
matrix operations).
@Ryan thx for these links, they are very interesting.
Well, I am going to party ! Have a happy new year !
Post by Michael Sokolov
I would love to see some tests of pure XQuery implementations of both
sparse and dense operations. I suspect performance of matrix multiply,
inversion, etc, will be poorer than in C++ or Java, but I would expect
performance comparable to Perl or Python (w/o its numpy extension) - just a
wild guess. I'd also expect it might be easier to get good sparse
performance. I don't know why, just a hunch.
But the more interesting question for me is whether language changes are
really needed to support this. I would have thought that proper
optimization of operations on sequences would be enough? So for example,
an extension module using sequences as matrix datatypes could possibly
optimize performance using a lower-level implementation. Does anyone see
any reason why that wouldn't be possible?
-Mike
I reviewed the discussion you referred to, jean-marc, but it seems to
have more to do with using functions as map keys, and I don't see any
direct connection to linear algebra?
It is not due to the spec. It is rather due to the common usage of
XQUERY, forcing vendor solutions (as BaseX) to focus primarily on XML Data
Base requests more than algorithmic performances.
There are actually some threads that are discussing these performance
issues in the context of maps (maps are for instance used for sparse matrix
representations) : look for instance to ""map module for XQUERY ?" that I
x-query.com mailing list.
Anyhow, to write a serious linear algebra modulus, the basic need is to
have a vector containers. Unfortunately, XQUERY does not provide any
performant vector containers at present time, and it is not possible to
code them in pure XQUERY : I have tried, and more experienced xquery
developpers than me have also tried, without success.
We have to wait for the XQUERY version that will give us these
containers, a decision to be taken by the W3C.
Post by Andrew Welch
Are you saying the spec as it stands somehow forces all implementations
to be 1000x slower, or just what you have observed in some particular
implementation?
Post by jean-marc Mercier
As far as I understand, you want to write a linear algebra module
using XQUERY ?
Post by jean-marc Mercier
If so, I opened a thread some months ago about this idea. My opinion
today is that this is a false good idea at present time.
Post by jean-marc Mercier
1) XQUERY would be really good for writing concise, efficient linear
algebra modulus.
Post by jean-marc Mercier
2) However, I strongly recommend to wait a little bit for starting
coding : the current version of XQUERY (3.0) suffers from performance
issues. A linear algebra modulus written in XQUERY is expected to have
performances performances 1000 X slower than its corresponding C++ or JAVA
(you can measure it precisely). Any mathematician linear algebra modulus
would probably trashed your modulus after the first test.
Post by jean-marc Mercier
Hope this helps
Post by Ihe Onwuka
Assuming a sparse representation it is about 4 lines of SQL. This is
known not least because you can read enough articles and papers that
discuss it and it optimises well. The obvious google search does not reveal
any corresponding XQuery discussion, neither does it appear to have
surfaced on this or the eXist mailing list (allowing for my deficient
search skills). For something so "trivial" I thought that was rather
strange. Hence I thought it would be prudent to ask before naively
embarking on a 600k X 40k matrix multiplication.
Post by jean-marc Mercier
Post by Ihe Onwuka
On Tue, Dec 31, 2013 at 11:31 AM, Andrew Welch <
Post by Andrew Welch
It should be pretty trivial...
Post by Ihe Onwuka
Has anybody tried this in XQuery or if I am so foolish (not yet
but give me time) would I be the courageous <culturalReference>
http://youtu.be/ik8JT2S-kBE early
adopter.
Post by jean-marc Mercier
Post by Ihe Onwuka
Post by Andrew Welch
Post by Ihe Onwuka
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
David Lee
2013-12-31 16:53:41 UTC
Permalink
Indeed ... pair( (1,2,3) ) is just a function that returns a function as per my example.
But to the point, you can either use vendor extensions (such as marklogic's json:array and map:map types) which have good support for efficient operations, or look to another language (<sigh>) You may find Scala more amenable to this type of programming.



From: talk-***@x-query.com [mailto:talk-***@x-query.com] On Behalf Of jean-marc Mercier
Sent: Tuesday, December 31, 2013 11:49 AM
To: Michael Sokolov
Cc: xquery-discuss; ihe onwuka; Andrew Welch
Subject: Re: [xquery-talk] Matrix Multiplication

@David, there are tricks to overcome sequence concatenations now. See for instance definition of a pair by Leo, John Snelson, or me : you can write pair( ( 1 , 2 , 3 ) , (4 5 , 6) ) to avoid sequence concatenation. I ve written also constant sized vectors using this trick : for instance NUPLE(1,(),<toto>,5), withh associated getters.
The bad new is that these operations takes too much time with all the interpreters I have tried, and can't be used in heavy algorithms.




2013/12/31 jean-marc Mercier <***@gmail.com<mailto:***@gmail.com>>
Hi

@Michael Concerning your remark over the discussions I quoted, maps are the basic block for sparse linear algebra. Without performent "maps of nodes" (equivalent to std::map in C++) you will not be able to write any performant linear algebra modulus for sparse matrix.

However, before even thinking to sparse matrix, operations on sequences as concatenations are the first "show-stop" to write a linear algebra modulus, since sequences are vectors.
Another one is the lack of constant sized vectors (needed for basic dense matrix operations).

@Ryan thx for these links, they are very interesting.

Well, I am going to party ! Have a happy new year !




2013/12/31 Michael Sokolov <***@safaribooksonline.com<mailto:***@safaribooksonline.com>>
I would love to see some tests of pure XQuery implementations of both sparse and dense operations. I suspect performance of matrix multiply, inversion, etc, will be poorer than in C++ or Java, but I would expect performance comparable to Perl or Python (w/o its numpy extension) - just a wild guess. I'd also expect it might be easier to get good sparse performance. I don't know why, just a hunch.

But the more interesting question for me is whether language changes are really needed to support this. I would have thought that proper optimization of operations on sequences would be enough? So for example, an extension module using sequences as matrix datatypes could possibly optimize performance using a lower-level implementation. Does anyone see any reason why that wouldn't be possible?

-Mike

PS:
I reviewed the discussion you referred to, jean-marc, but it seems to have more to do with using functions as map keys, and I don't see any direct connection to linear algebra?


On 12/31/2013 9:55 AM, jean-marc Mercier wrote:
It is not due to the spec. It is rather due to the common usage of XQUERY, forcing vendor solutions (as BaseX) to focus primarily on XML Data Base requests more than algorithmic performances.

There are actually some threads that are discussing these performance issues in the context of maps (maps are for instance used for sparse matrix representations) : look for instance to ""map module for XQUERY ?" that I initiated or "Higher-order XQuery Modules" by Leo from BaseX, on ***@x-query.com<http://x-query.com> mailing list.
Anyhow, to write a serious linear algebra modulus, the basic need is to have a vector containers. Unfortunately, XQUERY does not provide any performant vector containers at present time, and it is not possible to code them in pure XQUERY : I have tried, and more experienced xquery developpers than me have also tried, without success.

We have to wait for the XQUERY version that will give us these containers, a decision to be taken by the W3C.


2013/12/31 Andrew Welch <***@gmail.com<mailto:***@gmail.com>>

Are you saying the spec as it stands somehow forces all implementations to be 1000x slower, or just what you have observed in some particular implementation?
As far as I understand, you want to write a linear algebra module using XQUERY ?
If so, I opened a thread some months ago about this idea. My opinion today is that this is a false good idea at present time.
1) XQUERY would be really good for writing concise, efficient linear algebra modulus.
2) However, I strongly recommend to wait a little bit for starting coding : the current version of XQUERY (3.0) suffers from performance issues. A linear algebra modulus written in XQUERY is expected to have performances performances 1000 X slower than its corresponding C++ or JAVA (you can measure it precisely). Any mathematician linear algebra modulus would probably trashed your modulus after the first test.
Hope this helps
Assuming a sparse representation it is about 4 lines of SQL. This is known not least because you can read enough articles and papers that discuss it and it optimises well. The obvious google search does not reveal any corresponding XQuery discussion, neither does it appear to have surfaced on this or the eXist mailing list (allowing for my deficient search skills). For something so "trivial" I thought that was rather strange. Hence I thought it would be prudent to ask before naively embarking on a 600k X 40k matrix multiplication.
Post by Andrew Welch
It should be pretty trivial...
Has anybody tried this in XQuery or if I am so foolish (not yet but give me time) would I be the courageous http://youtu.be/ik8JT2S-kBE early adopter.
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
jean-marc Mercier
2013-12-31 17:02:12 UTC
Permalink
@David pairs are also basically needed to write a linear algebra modulus,
the topic of this thread. And XQUERY don't provide any efficient pair. You
can't use Marklogic map, or any other vendor map to store vectors for
performance issues (a map is really slow).

Note that there are a lot of workaround : I am using direct JAVA binding or
C++ binding from XQUERY for linear algebra not to pay a too heavy tribute
to these issue performances.

The point is simply to notice that XQUERY could be really good to write an
efficient linear algebra modulus. But, due to these performance issues,
nobody can write it. I just hope that the next XQUERY version will give the
necessary container to write it. Meanwhile, nobody can contribute to XQUERY
through external modulus using heavy algorithms, and that's just too bad.
Post by David Lee
Indeed ... pair( (1,2,3) ) is just a function that returns a function as per my example.
But to the point, you can either use vendor extensions (such as
marklogic's json:array and map:map types) which have good support for
efficient operations, or look to another language (<sigh>) You may find
Scala more amenable to this type of programming.
Behalf Of *jean-marc Mercier
*Sent:* Tuesday, December 31, 2013 11:49 AM
*To:* Michael Sokolov
*Cc:* xquery-discuss; ihe onwuka; Andrew Welch
*Subject:* Re: [xquery-talk] Matrix Multiplication
@David, there are tricks to overcome sequence concatenations now. See for
instance definition of a pair by Leo, John Snelson, or me : you can write
pair( ( 1 , 2 , 3 ) , (4 5 , 6) ) to avoid sequence concatenation. I ve
written also constant sized vectors using this trick : for instance
NUPLE(1,(),<toto>,5), withh associated getters.
The bad new is that these operations takes too much time with all the
interpreters I have tried, and can't be used in heavy algorithms.
Hi
@Michael Concerning your remark over the discussions I quoted, maps are
the basic block for sparse linear algebra. Without performent "maps of
nodes" (equivalent to std::map in C++) you will not be able to write any
performant linear algebra modulus for sparse matrix.
However, before even thinking to sparse matrix, operations on sequences as
concatenations are the first "show-stop" to write a linear algebra modulus,
since sequences are vectors.
Another one is the lack of constant sized vectors (needed for basic dense
matrix operations).
@Ryan thx for these links, they are very interesting.
Well, I am going to party ! Have a happy new year !
I would love to see some tests of pure XQuery implementations of both
sparse and dense operations. I suspect performance of matrix multiply,
inversion, etc, will be poorer than in C++ or Java, but I would expect
performance comparable to Perl or Python (w/o its numpy extension) - just a
wild guess. I'd also expect it might be easier to get good sparse
performance. I don't know why, just a hunch.
But the more interesting question for me is whether language changes are
really needed to support this. I would have thought that proper
optimization of operations on sequences would be enough? So for example,
an extension module using sequences as matrix datatypes could possibly
optimize performance using a lower-level implementation. Does anyone see
any reason why that wouldn't be possible?
-Mike
I reviewed the discussion you referred to, jean-marc, but it seems to have
more to do with using functions as map keys, and I don't see any direct
connection to linear algebra?
It is not due to the spec. It is rather due to the common usage of
XQUERY, forcing vendor solutions (as BaseX) to focus primarily on XML Data
Base requests more than algorithmic performances.
There are actually some threads that are discussing these performance
issues in the context of maps (maps are for instance used for sparse matrix
representations) : look for instance to ""map module for XQUERY ?" that I
x-query.com mailing list.
Anyhow, to write a serious linear algebra modulus, the basic need is to
have a vector containers. Unfortunately, XQUERY does not provide any
performant vector containers at present time, and it is not possible to
code them in pure XQUERY : I have tried, and more experienced xquery
developpers than me have also tried, without success.
We have to wait for the XQUERY version that will give us these containers,
a decision to be taken by the W3C.
Are you saying the spec as it stands somehow forces all implementations to
be 1000x slower, or just what you have observed in some particular
implementation?
Post by jean-marc Mercier
As far as I understand, you want to write a linear algebra module using
XQUERY ?
Post by jean-marc Mercier
If so, I opened a thread some months ago about this idea. My opinion
today is that this is a false good idea at present time.
Post by jean-marc Mercier
1) XQUERY would be really good for writing concise, efficient linear
algebra modulus.
Post by jean-marc Mercier
2) However, I strongly recommend to wait a little bit for starting
coding : the current version of XQUERY (3.0) suffers from performance
issues. A linear algebra modulus written in XQUERY is expected to have
performances performances 1000 X slower than its corresponding C++ or JAVA
(you can measure it precisely). Any mathematician linear algebra modulus
would probably trashed your modulus after the first test.
Post by jean-marc Mercier
Hope this helps
Post by Ihe Onwuka
Assuming a sparse representation it is about 4 lines of SQL. This is
known not least because you can read enough articles and papers that
discuss it and it optimises well. The obvious google search does not reveal
any corresponding XQuery discussion, neither does it appear to have
surfaced on this or the eXist mailing list (allowing for my deficient
search skills). For something so "trivial" I thought that was rather
strange. Hence I thought it would be prudent to ask before naively
embarking on a 600k X 40k matrix multiplication.
Post by jean-marc Mercier
Post by Ihe Onwuka
On Tue, Dec 31, 2013 at 11:31 AM, Andrew Welch <
Post by Andrew Welch
It should be pretty trivial...
Post by Ihe Onwuka
Has anybody tried this in XQuery or if I am so foolish (not yet but
give me time) would I be the courageous <culturalReference>
http://youtu.be/ik8JT2S-kBE early
adopter.
Post by jean-marc Mercier
Post by Ihe Onwuka
Post by Andrew Welch
Post by Ihe Onwuka
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
David Lee
2013-12-31 18:02:35 UTC
Permalink
One tiny correction
Your statement " You can't use Marklogic map, or any other vendor map to store vectors for performance issues (a map is really slow)."

So far on this thread this statement has not been substantiated.
MarkLogic maps and arrays are quite efficient, they are not the same thing as XQuery rb trees or anything else.
They are highly efficient C++ implementations of maps and arrays, implemented under the hood with the same data structures as you would in C++ stdlib.

Not suggesting your other line of reasoning is incorrect, only that so far on this thread I have not seen any evidence of testing vendor extensions for XQuery types. I suspect you don't want to do that for good reasons (for example its not portable, or its not a free product),
but I do suggest you cannot extrapolate performance from one vendors implementation to another.
As is usual in performance "it depends".



From: jean-marc Mercier [mailto:***@gmail.com]
Sent: Tuesday, December 31, 2013 12:02 PM
To: David Lee
Cc: Michael Sokolov; xquery-discuss; ihe onwuka; Andrew Welch
Subject: Re: [xquery-talk] Matrix Multiplication

@David pairs are also basically needed to write a linear algebra modulus, the topic of this thread. And XQUERY don't provide any efficient pair. You can't use Marklogic map, or any other vendor map to store vectors for performance issues (a map is really slow).

Note that there are a lot of workaround : I am using direct JAVA binding or C++ binding from XQUERY for linear algebra not to pay a too heavy tribute to these issue performances.

The point is simply to notice that XQUERY could be really good to write an efficient linear algebra modulus. But, due to these performance issues, nobody can write it. I just hope that the next XQUERY version will give the necessary container to write it. Meanwhile, nobody can contribute to XQUERY through external modulus using heavy algorithms, and that's just too bad.





2013/12/31 David Lee <***@calldei.com<mailto:***@calldei.com>>
Indeed ... pair( (1,2,3) ) is just a function that returns a function as per my example.
But to the point, you can either use vendor extensions (such as marklogic's json:array and map:map types) which have good support for efficient operations, or look to another language (<sigh>) You may find Scala more amenable to this type of programming.



From: talk-***@x-query.com<mailto:talk-***@x-query.com> [mailto:talk-***@x-query.com<mailto:talk-***@x-query.com>] On Behalf Of jean-marc Mercier
Sent: Tuesday, December 31, 2013 11:49 AM
To: Michael Sokolov
Cc: xquery-discuss; ihe onwuka; Andrew Welch

Subject: Re: [xquery-talk] Matrix Multiplication

@David, there are tricks to overcome sequence concatenations now. See for instance definition of a pair by Leo, John Snelson, or me : you can write pair( ( 1 , 2 , 3 ) , (4 5 , 6) ) to avoid sequence concatenation. I ve written also constant sized vectors using this trick : for instance NUPLE(1,(),<toto>,5), withh associated getters.
The bad new is that these operations takes too much time with all the interpreters I have tried, and can't be used in heavy algorithms.




2013/12/31 jean-marc Mercier <***@gmail.com<mailto:***@gmail.com>>
Hi

@Michael Concerning your remark over the discussions I quoted, maps are the basic block for sparse linear algebra. Without performent "maps of nodes" (equivalent to std::map in C++) you will not be able to write any performant linear algebra modulus for sparse matrix.

However, before even thinking to sparse matrix, operations on sequences as concatenations are the first "show-stop" to write a linear algebra modulus, since sequences are vectors.
Another one is the lack of constant sized vectors (needed for basic dense matrix operations).

@Ryan thx for these links, they are very interesting.

Well, I am going to party ! Have a happy new year !




2013/12/31 Michael Sokolov <***@safaribooksonline.com<mailto:***@safaribooksonline.com>>
I would love to see some tests of pure XQuery implementations of both sparse and dense operations. I suspect performance of matrix multiply, inversion, etc, will be poorer than in C++ or Java, but I would expect performance comparable to Perl or Python (w/o its numpy extension) - just a wild guess. I'd also expect it might be easier to get good sparse performance. I don't know why, just a hunch.

But the more interesting question for me is whether language changes are really needed to support this. I would have thought that proper optimization of operations on sequences would be enough? So for example, an extension module using sequences as matrix datatypes could possibly optimize performance using a lower-level implementation. Does anyone see any reason why that wouldn't be possible?

-Mike

PS:
I reviewed the discussion you referred to, jean-marc, but it seems to have more to do with using functions as map keys, and I don't see any direct connection to linear algebra?


On 12/31/2013 9:55 AM, jean-marc Mercier wrote:
It is not due to the spec. It is rather due to the common usage of XQUERY, forcing vendor solutions (as BaseX) to focus primarily on XML Data Base requests more than algorithmic performances.

There are actually some threads that are discussing these performance issues in the context of maps (maps are for instance used for sparse matrix representations) : look for instance to ""map module for XQUERY ?" that I initiated or "Higher-order XQuery Modules" by Leo from BaseX, on ***@x-query.com<http://x-query.com> mailing list.
Anyhow, to write a serious linear algebra modulus, the basic need is to have a vector containers. Unfortunately, XQUERY does not provide any performant vector containers at present time, and it is not possible to code them in pure XQUERY : I have tried, and more experienced xquery developpers than me have also tried, without success.

We have to wait for the XQUERY version that will give us these containers, a decision to be taken by the W3C.


2013/12/31 Andrew Welch <***@gmail.com<mailto:***@gmail.com>>

Are you saying the spec as it stands somehow forces all implementations to be 1000x slower, or just what you have observed in some particular implementation?
As far as I understand, you want to write a linear algebra module using XQUERY ?
If so, I opened a thread some months ago about this idea. My opinion today is that this is a false good idea at present time.
1) XQUERY would be really good for writing concise, efficient linear algebra modulus.
2) However, I strongly recommend to wait a little bit for starting coding : the current version of XQUERY (3.0) suffers from performance issues. A linear algebra modulus written in XQUERY is expected to have performances performances 1000 X slower than its corresponding C++ or JAVA (you can measure it precisely). Any mathematician linear algebra modulus would probably trashed your modulus after the first test.
Hope this helps
Assuming a sparse representation it is about 4 lines of SQL. This is known not least because you can read enough articles and papers that discuss it and it optimises well. The obvious google search does not reveal any corresponding XQuery discussion, neither does it appear to have surfaced on this or the eXist mailing list (allowing for my deficient search skills). For something so "trivial" I thought that was rather strange. Hence I thought it would be prudent to ask before naively embarking on a 600k X 40k matrix multiplication.
Post by Andrew Welch
It should be pretty trivial...
Has anybody tried this in XQuery or if I am so foolish (not yet but give me time) would I be the courageous http://youtu.be/ik8JT2S-kBE early adopter.
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
jean-marc Mercier
2014-01-01 09:51:27 UTC
Permalink
@David

Maybe my statement was not clear. Please let me do it again : I am just
saying that maps are slow in insertion, compared to vectors. This is true
for any implementation, any language : if you use a map instead of a
vector, the overhead for each insertion or reading is log(n), where n is
the size of the vector, that is huge when you look primarily for
performances. Furthermore, please check this thread: 3 of us, including me,
have tried to develop an external modulus implementing a map. The best
result, that is the one of Leo from BaseX, consists in a 100X overhead
factor compared to a vendor solution, the map from BaseX. So far we tried,
so far we failed: this failure is just telling us that we can't use XQUERY
to develop basic types as pairs, maps, vectors through external XQUERY
modulus. In this context, I would not try, and will discourage anyone else,
to develop an external linear Algebra modulus with this version of XQUERY,
because it will be too slow compared to imperative language already
existing linear algebra modulus.

David, I really try to have a scientific approach, I am not bound to any
organization, have no particular private interest in the xml industry. I am
just a xquery user. And I wish I were wrong. Please do not hesitate to
point me out any failure in my reasoning.

Meanwhile, I wish you a happy new year !!!
Post by David Lee
One tiny correction
Your statement " You can't use Marklogic map, or any other vendor map to
store vectors for performance issues (a map is really slow)."
So far on this thread this statement has not been substantiated.
MarkLogic maps and arrays are quite efficient, they are not the same thing
as XQuery rb trees or anything else.
They are highly efficient C++ implementations of maps and arrays,
implemented under the hood with the same data structures as you would in
C++ stdlib.
Not suggesting your other line of reasoning is incorrect, only that so far
on this thread I have not seen any evidence of testing vendor extensions
for XQuery types. I suspect you don't want to do that for good reasons
(for example its not portable, or its not a free product),
but I do suggest you cannot extrapolate performance from one vendors
implementation to another.
As is usual in performance "it depends".
*Sent:* Tuesday, December 31, 2013 12:02 PM
*To:* David Lee
*Cc:* Michael Sokolov; xquery-discuss; ihe onwuka; Andrew Welch
*Subject:* Re: [xquery-talk] Matrix Multiplication
@David pairs are also basically needed to write a linear algebra modulus,
the topic of this thread. And XQUERY don't provide any efficient pair. You
can't use Marklogic map, or any other vendor map to store vectors for
performance issues (a map is really slow).
Note that there are a lot of workaround : I am using direct JAVA binding
or C++ binding from XQUERY for linear algebra not to pay a too heavy
tribute to these issue performances.
The point is simply to notice that XQUERY could be really good to write an
efficient linear algebra modulus. But, due to these performance issues,
nobody can write it. I just hope that the next XQUERY version will give the
necessary container to write it. Meanwhile, nobody can contribute to XQUERY
through external modulus using heavy algorithms, and that's just too bad.
Indeed ... pair( (1,2,3) ) is just a function that returns a function as per my example.
But to the point, you can either use vendor extensions (such as
marklogic's json:array and map:map types) which have good support for
efficient operations, or look to another language (<sigh>) You may find
Scala more amenable to this type of programming.
Behalf Of *jean-marc Mercier
*Sent:* Tuesday, December 31, 2013 11:49 AM
*To:* Michael Sokolov
*Cc:* xquery-discuss; ihe onwuka; Andrew Welch
*Subject:* Re: [xquery-talk] Matrix Multiplication
@David, there are tricks to overcome sequence concatenations now. See for
instance definition of a pair by Leo, John Snelson, or me : you can write
pair( ( 1 , 2 , 3 ) , (4 5 , 6) ) to avoid sequence concatenation. I ve
written also constant sized vectors using this trick : for instance
NUPLE(1,(),<toto>,5), withh associated getters.
The bad new is that these operations takes too much time with all the
interpreters I have tried, and can't be used in heavy algorithms.
Hi
@Michael Concerning your remark over the discussions I quoted, maps are
the basic block for sparse linear algebra. Without performent "maps of
nodes" (equivalent to std::map in C++) you will not be able to write any
performant linear algebra modulus for sparse matrix.
However, before even thinking to sparse matrix, operations on sequences as
concatenations are the first "show-stop" to write a linear algebra modulus,
since sequences are vectors.
Another one is the lack of constant sized vectors (needed for basic dense
matrix operations).
@Ryan thx for these links, they are very interesting.
Well, I am going to party ! Have a happy new year !
I would love to see some tests of pure XQuery implementations of both
sparse and dense operations. I suspect performance of matrix multiply,
inversion, etc, will be poorer than in C++ or Java, but I would expect
performance comparable to Perl or Python (w/o its numpy extension) - just a
wild guess. I'd also expect it might be easier to get good sparse
performance. I don't know why, just a hunch.
But the more interesting question for me is whether language changes are
really needed to support this. I would have thought that proper
optimization of operations on sequences would be enough? So for example,
an extension module using sequences as matrix datatypes could possibly
optimize performance using a lower-level implementation. Does anyone see
any reason why that wouldn't be possible?
-Mike
I reviewed the discussion you referred to, jean-marc, but it seems to have
more to do with using functions as map keys, and I don't see any direct
connection to linear algebra?
It is not due to the spec. It is rather due to the common usage of
XQUERY, forcing vendor solutions (as BaseX) to focus primarily on XML Data
Base requests more than algorithmic performances.
There are actually some threads that are discussing these performance
issues in the context of maps (maps are for instance used for sparse matrix
representations) : look for instance to ""map module for XQUERY ?" that I
x-query.com mailing list.
Anyhow, to write a serious linear algebra modulus, the basic need is to
have a vector containers. Unfortunately, XQUERY does not provide any
performant vector containers at present time, and it is not possible to
code them in pure XQUERY : I have tried, and more experienced xquery
developpers than me have also tried, without success.
We have to wait for the XQUERY version that will give us these containers,
a decision to be taken by the W3C.
Are you saying the spec as it stands somehow forces all implementations to
be 1000x slower, or just what you have observed in some particular
implementation?
Post by jean-marc Mercier
As far as I understand, you want to write a linear algebra module using
XQUERY ?
Post by jean-marc Mercier
If so, I opened a thread some months ago about this idea. My opinion
today is that this is a false good idea at present time.
Post by jean-marc Mercier
1) XQUERY would be really good for writing concise, efficient linear
algebra modulus.
Post by jean-marc Mercier
2) However, I strongly recommend to wait a little bit for starting
coding : the current version of XQUERY (3.0) suffers from performance
issues. A linear algebra modulus written in XQUERY is expected to have
performances performances 1000 X slower than its corresponding C++ or JAVA
(you can measure it precisely). Any mathematician linear algebra modulus
would probably trashed your modulus after the first test.
Post by jean-marc Mercier
Hope this helps
Post by Ihe Onwuka
Assuming a sparse representation it is about 4 lines of SQL. This is
known not least because you can read enough articles and papers that
discuss it and it optimises well. The obvious google search does not reveal
any corresponding XQuery discussion, neither does it appear to have
surfaced on this or the eXist mailing list (allowing for my deficient
search skills). For something so "trivial" I thought that was rather
strange. Hence I thought it would be prudent to ask before naively
embarking on a 600k X 40k matrix multiplication.
Post by jean-marc Mercier
Post by Ihe Onwuka
On Tue, Dec 31, 2013 at 11:31 AM, Andrew Welch <
Post by Andrew Welch
It should be pretty trivial...
Post by Ihe Onwuka
Has anybody tried this in XQuery or if I am so foolish (not yet but
give me time) would I be the courageous <culturalReference>
http://youtu.be/ik8JT2S-kBE early
adopter.
Post by jean-marc Mercier
Post by Ihe Onwuka
Post by Andrew Welch
Post by Ihe Onwuka
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
David Lee
2014-01-01 15:20:56 UTC
Permalink
@jean-marc
I am not disagreeing with your results, with pure XQuery (without vendor extensions) I believe you will not come close to code written in C or even Java
for things like maps and vectors. But you yourself say that the map from BaseX is 100X faster than one developed in pure XQuery.
Yet you *can use this map in XQuery* ... That's the whole point of vendor extensions is to overcome limitations in languages where appropriate.
This is not unique to XQuery. I have not come across a language yet that doesn't have either vendor extensions, or libraries written in "native" code (lower level code then the language itself). Even in C many standard library functions are actually optimized either by hand written assembly or compiler insertion to improve efficiencies of the pure language. Furthermore, you have not tried all vendors, I discourage you from extrapolating from some vendors to all vendor implementations. I am not suggesting you try all vendors (that would take a long time) only that generalizing statements about performance cannot be entirely accurate without doing so.

But I do agree if your goal is to avoid any vendor extensions and provide a "100% Pure XQuery" module for functions that need to make great use of maps and vectors, XQuery is likey to not be as performant as languages which have the data types you need built in.
( which in essence is saying that those languages native types are the equivalent to XQuery vendor extensions except they are defined by the language spec instead of the vendor spec. ).


Also as a tiny sub-diversion, I do disagree with this generalized statement
" This is true for any implementation, any language : if you use a map instead of a vector, the overhead for each insertion or reading is log(n), where n is the size of the vector"

This is entirely dependent on the implementation of the map and vector. I assert you cannot make such generalized assentation's.
For example, a map based on a hash instead of a tree can be linear time until it reaches some internal sizing.
Performance in real life is never as simple as on paper. In your case you created a map based on a tree. That will be roughly log(n), true.
But that is not the only possible implementation of maps.

In the end though ... for the cases you are trying for, I suggest 100% pure XQuery will not provide the performance you need.
So your choice is to use a different language, or to find an implementation of XQuery which has vendor extensions customized for
the data types and operations not in "100% pure XQuery".





From: jean-marc Mercier [mailto:***@gmail.com]
Sent: Wednesday, January 01, 2014 4:51 AM
To: David Lee
Cc: Michael Sokolov; xquery-discuss; ihe onwuka; Andrew Welch
Subject: Re: [xquery-talk] Matrix Multiplication

@David

Maybe my statement was not clear. Please let me do it again : I am just saying that maps are slow in insertion, compared to vectors. This is true for any implementation, any language : if you use a map instead of a vector, the overhead for each insertion or reading is log(n), where n is the size of the vector, that is huge when you look primarily for performances. Furthermore, please check this thread: 3 of us, including me, have tried to develop an external modulus implementing a map. The best result, that is the one of Leo from BaseX, consists in a 100X overhead factor compared to a vendor solution, the map from BaseX. So far we tried, so far we failed: this failure is just telling us that we can't use XQUERY to develop basic types as pairs, maps, vectors through external XQUERY modulus. In this context, I would not try, and will discourage anyone else, to develop an external linear Algebra modulus with this version of XQUERY, because it will be too slow compared to imperative language already existing linear algebra modulus.

David, I really try to have a scientific approach, I am not bound to any organization, have no particular private interest in the xml industry. I am just a xquery user. And I wish I were wrong. Please do not hesitate to point me out any failure in my reasoning.

Meanwhile, I wish you a happy new year !!!



2013/12/31 David Lee <***@calldei.com<mailto:***@calldei.com>>
One tiny correction
Your statement " You can't use Marklogic map, or any other vendor map to store vectors for performance issues (a map is really slow)."

So far on this thread this statement has not been substantiated.
MarkLogic maps and arrays are quite efficient, they are not the same thing as XQuery rb trees or anything else.
They are highly efficient C++ implementations of maps and arrays, implemented under the hood with the same data structures as you would in C++ stdlib.

Not suggesting your other line of reasoning is incorrect, only that so far on this thread I have not seen any evidence of testing vendor extensions for XQuery types. I suspect you don't want to do that for good reasons (for example its not portable, or its not a free product),
but I do suggest you cannot extrapolate performance from one vendors implementation to another.
As is usual in performance "it depends".



From: jean-marc Mercier [mailto:***@gmail.com<mailto:***@gmail.com>]
Sent: Tuesday, December 31, 2013 12:02 PM
To: David Lee
Cc: Michael Sokolov; xquery-discuss; ihe onwuka; Andrew Welch

Subject: Re: [xquery-talk] Matrix Multiplication

@David pairs are also basically needed to write a linear algebra modulus, the topic of this thread. And XQUERY don't provide any efficient pair. You can't use Marklogic map, or any other vendor map to store vectors for performance issues (a map is really slow).

Note that there are a lot of workaround : I am using direct JAVA binding or C++ binding from XQUERY for linear algebra not to pay a too heavy tribute to these issue performances.

The point is simply to notice that XQUERY could be really good to write an efficient linear algebra modulus. But, due to these performance issues, nobody can write it. I just hope that the next XQUERY version will give the necessary container to write it. Meanwhile, nobody can contribute to XQUERY through external modulus using heavy algorithms, and that's just too bad.





2013/12/31 David Lee <***@calldei.com<mailto:***@calldei.com>>
Indeed ... pair( (1,2,3) ) is just a function that returns a function as per my example.
But to the point, you can either use vendor extensions (such as marklogic's json:array and map:map types) which have good support for efficient operations, or look to another language (<sigh>) You may find Scala more amenable to this type of programming.



From: talk-***@x-query.com<mailto:talk-***@x-query.com> [mailto:talk-***@x-query.com<mailto:talk-***@x-query.com>] On Behalf Of jean-marc Mercier
Sent: Tuesday, December 31, 2013 11:49 AM
To: Michael Sokolov
Cc: xquery-discuss; ihe onwuka; Andrew Welch

Subject: Re: [xquery-talk] Matrix Multiplication

@David, there are tricks to overcome sequence concatenations now. See for instance definition of a pair by Leo, John Snelson, or me : you can write pair( ( 1 , 2 , 3 ) , (4 5 , 6) ) to avoid sequence concatenation. I ve written also constant sized vectors using this trick : for instance NUPLE(1,(),<toto>,5), withh associated getters.
The bad new is that these operations takes too much time with all the interpreters I have tried, and can't be used in heavy algorithms.




2013/12/31 jean-marc Mercier <***@gmail.com<mailto:***@gmail.com>>
Hi

@Michael Concerning your remark over the discussions I quoted, maps are the basic block for sparse linear algebra. Without performent "maps of nodes" (equivalent to std::map in C++) you will not be able to write any performant linear algebra modulus for sparse matrix.

However, before even thinking to sparse matrix, operations on sequences as concatenations are the first "show-stop" to write a linear algebra modulus, since sequences are vectors.
Another one is the lack of constant sized vectors (needed for basic dense matrix operations).

@Ryan thx for these links, they are very interesting.

Well, I am going to party ! Have a happy new year !




2013/12/31 Michael Sokolov <***@safaribooksonline.com<mailto:***@safaribooksonline.com>>
I would love to see some tests of pure XQuery implementations of both sparse and dense operations. I suspect performance of matrix multiply, inversion, etc, will be poorer than in C++ or Java, but I would expect performance comparable to Perl or Python (w/o its numpy extension) - just a wild guess. I'd also expect it might be easier to get good sparse performance. I don't know why, just a hunch.

But the more interesting question for me is whether language changes are really needed to support this. I would have thought that proper optimization of operations on sequences would be enough? So for example, an extension module using sequences as matrix datatypes could possibly optimize performance using a lower-level implementation. Does anyone see any reason why that wouldn't be possible?

-Mike

PS:
I reviewed the discussion you referred to, jean-marc, but it seems to have more to do with using functions as map keys, and I don't see any direct connection to linear algebra?


On 12/31/2013 9:55 AM, jean-marc Mercier wrote:
It is not due to the spec. It is rather due to the common usage of XQUERY, forcing vendor solutions (as BaseX) to focus primarily on XML Data Base requests more than algorithmic performances.

There are actually some threads that are discussing these performance issues in the context of maps (maps are for instance used for sparse matrix representations) : look for instance to ""map module for XQUERY ?" that I initiated or "Higher-order XQuery Modules" by Leo from BaseX, on ***@x-query.com<http://x-query.com> mailing list.
Anyhow, to write a serious linear algebra modulus, the basic need is to have a vector containers. Unfortunately, XQUERY does not provide any performant vector containers at present time, and it is not possible to code them in pure XQUERY : I have tried, and more experienced xquery developpers than me have also tried, without success.

We have to wait for the XQUERY version that will give us these containers, a decision to be taken by the W3C.


2013/12/31 Andrew Welch <***@gmail.com<mailto:***@gmail.com>>

Are you saying the spec as it stands somehow forces all implementations to be 1000x slower, or just what you have observed in some particular implementation?
As far as I understand, you want to write a linear algebra module using XQUERY ?
If so, I opened a thread some months ago about this idea. My opinion today is that this is a false good idea at present time.
1) XQUERY would be really good for writing concise, efficient linear algebra modulus.
2) However, I strongly recommend to wait a little bit for starting coding : the current version of XQUERY (3.0) suffers from performance issues. A linear algebra modulus written in XQUERY is expected to have performances performances 1000 X slower than its corresponding C++ or JAVA (you can measure it precisely). Any mathematician linear algebra modulus would probably trashed your modulus after the first test.
Hope this helps
Assuming a sparse representation it is about 4 lines of SQL. This is known not least because you can read enough articles and papers that discuss it and it optimises well. The obvious google search does not reveal any corresponding XQuery discussion, neither does it appear to have surfaced on this or the eXist mailing list (allowing for my deficient search skills). For something so "trivial" I thought that was rather strange. Hence I thought it would be prudent to ask before naively embarking on a 600k X 40k matrix multiplication.
Post by Andrew Welch
It should be pretty trivial...
Has anybody tried this in XQuery or if I am so foolish (not yet but give me time) would I be the courageous http://youtu.be/ik8JT2S-kBE early adopter.
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Adam Retter
2014-01-01 15:33:31 UTC
Permalink
I seriously suggest that you email the XQuery Working Group with your use
case. In particular you should contact Ghislain Fourny and Jonathan Robie,
as both have been working on and gathering use cases for arrays in XQuery
3.1. So far efficient matrix operations is not one that I have seen.

Cheers Adam.
Post by jean-marc Mercier
@David pairs are also basically needed to write a linear algebra modulus,
the topic of this thread. And XQUERY don't provide any efficient pair. You
can't use Marklogic map, or any other vendor map to store vectors for
performance issues (a map is really slow).
Note that there are a lot of workaround : I am using direct JAVA binding
or C++ binding from XQUERY for linear algebra not to pay a too heavy
tribute to these issue performances.
The point is simply to notice that XQUERY could be really good to write an
efficient linear algebra modulus. But, due to these performance issues,
nobody can write it. I just hope that the next XQUERY version will give the
necessary container to write it. Meanwhile, nobody can contribute to XQUERY
through external modulus using heavy algorithms, and that's just too bad.
Post by David Lee
Indeed ... pair( (1,2,3) ) is just a function that returns a function
as per my example.
But to the point, you can either use vendor extensions (such as
marklogic's json:array and map:map types) which have good support for
efficient operations, or look to another language (<sigh>) You may find
Scala more amenable to this type of programming.
Behalf Of *jean-marc Mercier
*Sent:* Tuesday, December 31, 2013 11:49 AM
*To:* Michael Sokolov
*Cc:* xquery-discuss; ihe onwuka; Andrew Welch
*Subject:* Re: [xquery-talk] Matrix Multiplication
@David, there are tricks to overcome sequence concatenations now. See for
instance definition of a pair by Leo, John Snelson, or me : you can write
pair( ( 1 , 2 , 3 ) , (4 5 , 6) ) to avoid sequence concatenation. I ve
written also constant sized vectors using this trick : for instance
NUPLE(1,(),<toto>,5), withh associated getters.
The bad new is that these operations takes too much time with all the
interpreters I have tried, and can't be used in heavy algorithms.
Hi
@Michael Concerning your remark over the discussions I quoted, maps are
the basic block for sparse linear algebra. Without performent "maps of
nodes" (equivalent to std::map in C++) you will not be able to write any
performant linear algebra modulus for sparse matrix.
However, before even thinking to sparse matrix, operations on sequences
as concatenations are the first "show-stop" to write a linear algebra
modulus, since sequences are vectors.
Another one is the lack of constant sized vectors (needed for basic dense
matrix operations).
@Ryan thx for these links, they are very interesting.
Well, I am going to party ! Have a happy new year !
I would love to see some tests of pure XQuery implementations of both
sparse and dense operations. I suspect performance of matrix multiply,
inversion, etc, will be poorer than in C++ or Java, but I would expect
performance comparable to Perl or Python (w/o its numpy extension) - just a
wild guess. I'd also expect it might be easier to get good sparse
performance. I don't know why, just a hunch.
But the more interesting question for me is whether language changes are
really needed to support this. I would have thought that proper
optimization of operations on sequences would be enough? So for example,
an extension module using sequences as matrix datatypes could possibly
optimize performance using a lower-level implementation. Does anyone see
any reason why that wouldn't be possible?
-Mike
I reviewed the discussion you referred to, jean-marc, but it seems to
have more to do with using functions as map keys, and I don't see any
direct connection to linear algebra?
It is not due to the spec. It is rather due to the common usage of
XQUERY, forcing vendor solutions (as BaseX) to focus primarily on XML Data
Base requests more than algorithmic performances.
There are actually some threads that are discussing these performance
issues in the context of maps (maps are for instance used for sparse matrix
representations) : look for instance to ""map module for XQUERY ?" that I
x-query.com mailing list.
Anyhow, to write a serious linear algebra modulus, the basic need is to
have a vector containers. Unfortunately, XQUERY does not provide any
performant vector containers at present time, and it is not possible to
code them in pure XQUERY : I have tried, and more experienced xquery
developpers than me have also tried, without success.
We have to wait for the XQUERY version that will give us these
containers, a decision to be taken by the W3C.
Are you saying the spec as it stands somehow forces all implementations
to be 1000x slower, or just what you have observed in some particular
implementation?
Post by jean-marc Mercier
As far as I understand, you want to write a linear algebra module using
XQUERY ?
Post by jean-marc Mercier
If so, I opened a thread some months ago about this idea. My opinion
today is that this is a false good idea at present time.
Post by jean-marc Mercier
1) XQUERY would be really good for writing concise, efficient linear
algebra modulus.
Post by jean-marc Mercier
2) However, I strongly recommend to wait a little bit for starting
coding : the current version of XQUERY (3.0) suffers from performance
issues. A linear algebra modulus written in XQUERY is expected to have
performances performances 1000 X slower than its corresponding C++ or JAVA
(you can measure it precisely). Any mathematician linear algebra modulus
would probably trashed your modulus after the first test.
Post by jean-marc Mercier
Hope this helps
Post by Ihe Onwuka
Assuming a sparse representation it is about 4 lines of SQL. This is
known not least because you can read enough articles and papers that
discuss it and it optimises well. The obvious google search does not reveal
any corresponding XQuery discussion, neither does it appear to have
surfaced on this or the eXist mailing list (allowing for my deficient
search skills). For something so "trivial" I thought that was rather
strange. Hence I thought it would be prudent to ask before naively
embarking on a 600k X 40k matrix multiplication.
Post by jean-marc Mercier
Post by Ihe Onwuka
On Tue, Dec 31, 2013 at 11:31 AM, Andrew Welch <
Post by Andrew Welch
It should be pretty trivial...
Post by Ihe Onwuka
Has anybody tried this in XQuery or if I am so foolish (not yet but
give me time) would I be the courageous <culturalReference>
http://youtu.be/ik8JT2S-kBE early
adopter.
Post by jean-marc Mercier
Post by Ihe Onwuka
Post by Andrew Welch
Post by Ihe Onwuka
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
jean-marc Mercier
2014-01-01 15:41:57 UTC
Permalink
@Adam, I will, thx for this wise advise.
Post by Adam Retter
I seriously suggest that you email the XQuery Working Group with your use
case. In particular you should contact Ghislain Fourny and Jonathan Robie,
as both have been working on and gathering use cases for arrays in XQuery
3.1. So far efficient matrix operations is not one that I have seen.
Cheers Adam.
Post by jean-marc Mercier
@David pairs are also basically needed to write a linear algebra modulus,
the topic of this thread. And XQUERY don't provide any efficient pair. You
can't use Marklogic map, or any other vendor map to store vectors for
performance issues (a map is really slow).
Note that there are a lot of workaround : I am using direct JAVA binding
or C++ binding from XQUERY for linear algebra not to pay a too heavy
tribute to these issue performances.
The point is simply to notice that XQUERY could be really good to write
an efficient linear algebra modulus. But, due to these performance issues,
nobody can write it. I just hope that the next XQUERY version will give the
necessary container to write it. Meanwhile, nobody can contribute to XQUERY
through external modulus using heavy algorithms, and that's just too bad.
Post by David Lee
Indeed ... pair( (1,2,3) ) is just a function that returns a function
as per my example.
But to the point, you can either use vendor extensions (such as
marklogic's json:array and map:map types) which have good support for
efficient operations, or look to another language (<sigh>) You may find
Scala more amenable to this type of programming.
Behalf Of *jean-marc Mercier
*Sent:* Tuesday, December 31, 2013 11:49 AM
*To:* Michael Sokolov
*Cc:* xquery-discuss; ihe onwuka; Andrew Welch
*Subject:* Re: [xquery-talk] Matrix Multiplication
@David, there are tricks to overcome sequence concatenations now. See
for instance definition of a pair by Leo, John Snelson, or me : you can
write pair( ( 1 , 2 , 3 ) , (4 5 , 6) ) to avoid sequence
for instance NUPLE(1,(),<toto>,5), withh associated getters.
The bad new is that these operations takes too much time with all the
interpreters I have tried, and can't be used in heavy algorithms.
Hi
@Michael Concerning your remark over the discussions I quoted, maps are
the basic block for sparse linear algebra. Without performent "maps of
nodes" (equivalent to std::map in C++) you will not be able to write any
performant linear algebra modulus for sparse matrix.
However, before even thinking to sparse matrix, operations on sequences
as concatenations are the first "show-stop" to write a linear algebra
modulus, since sequences are vectors.
Another one is the lack of constant sized vectors (needed for basic
dense matrix operations).
@Ryan thx for these links, they are very interesting.
Well, I am going to party ! Have a happy new year !
I would love to see some tests of pure XQuery implementations of both
sparse and dense operations. I suspect performance of matrix multiply,
inversion, etc, will be poorer than in C++ or Java, but I would expect
performance comparable to Perl or Python (w/o its numpy extension) - just a
wild guess. I'd also expect it might be easier to get good sparse
performance. I don't know why, just a hunch.
But the more interesting question for me is whether language changes are
really needed to support this. I would have thought that proper
optimization of operations on sequences would be enough? So for example,
an extension module using sequences as matrix datatypes could possibly
optimize performance using a lower-level implementation. Does anyone see
any reason why that wouldn't be possible?
-Mike
I reviewed the discussion you referred to, jean-marc, but it seems to
have more to do with using functions as map keys, and I don't see any
direct connection to linear algebra?
It is not due to the spec. It is rather due to the common usage of
XQUERY, forcing vendor solutions (as BaseX) to focus primarily on XML Data
Base requests more than algorithmic performances.
There are actually some threads that are discussing these performance
issues in the context of maps (maps are for instance used for sparse matrix
representations) : look for instance to ""map module for XQUERY ?" that I
x-query.com mailing list.
Anyhow, to write a serious linear algebra modulus, the basic need is to
have a vector containers. Unfortunately, XQUERY does not provide any
performant vector containers at present time, and it is not possible to
code them in pure XQUERY : I have tried, and more experienced xquery
developpers than me have also tried, without success.
We have to wait for the XQUERY version that will give us these
containers, a decision to be taken by the W3C.
Are you saying the spec as it stands somehow forces all implementations
to be 1000x slower, or just what you have observed in some particular
implementation?
Post by jean-marc Mercier
As far as I understand, you want to write a linear algebra module
using XQUERY ?
Post by jean-marc Mercier
If so, I opened a thread some months ago about this idea. My opinion
today is that this is a false good idea at present time.
Post by jean-marc Mercier
1) XQUERY would be really good for writing concise, efficient linear
algebra modulus.
Post by jean-marc Mercier
2) However, I strongly recommend to wait a little bit for starting
coding : the current version of XQUERY (3.0) suffers from performance
issues. A linear algebra modulus written in XQUERY is expected to have
performances performances 1000 X slower than its corresponding C++ or JAVA
(you can measure it precisely). Any mathematician linear algebra modulus
would probably trashed your modulus after the first test.
Post by jean-marc Mercier
Hope this helps
Post by Ihe Onwuka
Assuming a sparse representation it is about 4 lines of SQL. This is
known not least because you can read enough articles and papers that
discuss it and it optimises well. The obvious google search does not reveal
any corresponding XQuery discussion, neither does it appear to have
surfaced on this or the eXist mailing list (allowing for my deficient
search skills). For something so "trivial" I thought that was rather
strange. Hence I thought it would be prudent to ask before naively
embarking on a 600k X 40k matrix multiplication.
Post by jean-marc Mercier
Post by Ihe Onwuka
On Tue, Dec 31, 2013 at 11:31 AM, Andrew Welch <
Post by Andrew Welch
It should be pretty trivial...
Post by Ihe Onwuka
Has anybody tried this in XQuery or if I am so foolish (not yet
but give me time) would I be the courageous <culturalReference>
http://youtu.be/ik8JT2S-kBE early
adopter.
Post by jean-marc Mercier
Post by Ihe Onwuka
Post by Andrew Welch
Post by Ihe Onwuka
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
David Lee
2014-01-01 15:54:42 UTC
Permalink
On 31 Dec 2013 17:03, "jean-marc Mercier" <***@gmail.com<mailto:***@gmail.com>> wrote:
@David pairs are also basically needed to write a linear algebra modulus, the topic of this thread. And XQUERY don't provide any efficient pair. You can't use Marklogic map, or any other vendor map to store vectors for performance issues (a map is really slow).
http://x-query.com/mailman/listinfo/talk
[DAL:]
Just an FYI but MarkLogic also have native arrays (in addition to maps) which are extremely efficient (they are stored internally as C arrays).
But if you have to do a lot of iterating through the arrays - even though the accessors are very efficient ,
the surrounding FLOWR code is still XQuery which slows things down a bit. Maybe or maybe not enough to make them not useful for you.
Also the marklogic maps are not the same as XQuery 3 implemented maps, they are a hash map under the hood and typically linear access.

But back to your original issue, these are vendor extensions and hence not portable to other implementations (Until XQuery itself
standardizes on arrays and maps - then it is likely that vendor extension implementations will be used to expose the standard interface).
If what you want is pure XQuery ... it doesnt matter how fast these are if you cant use them because they dont exist in all implementations.


-David
jean-marc Mercier
2014-01-01 16:05:23 UTC
Permalink
@David, indeed, I am a little bit reluctant to use non standardized tools.
Anyhow, I am quite surprised by your sentence : there exists map containers
working in linear access ? Do you have any references to point out ?
Post by jean-marc Mercier
@David pairs are also basically needed to write a linear algebra modulus,
the topic of this thread. And XQUERY don't provide any efficient pair. You
can't use Marklogic map, or any other vendor map to store vectors for
performance issues (a map is really slow).
http://x-query.com/mailman/listinfo/talk
*[DAL:] *
*Just an FYI but MarkLogic also have native arrays (in addition to maps)
which are extremely efficient (they are stored internally as C arrays).*
*But if you have to do a lot of iterating through the arrays - even
though the accessors are very efficient ,*
*the surrounding FLOWR code is still XQuery which slows things down a
bit. Maybe or maybe not enough to make them not useful for you.*
*Also the marklogic maps are not the same as XQuery 3 implemented maps,
they are a hash map under the hood and typically linear access.*
*But back to your original issue, these are vendor extensions and hence
not portable to other implementations (Until XQuery itself *
*standardizes on arrays and maps - then it is likely that vendor extension
implementations will be used to expose the standard interface).*
*If what you want is pure XQuery ... it doesnt matter how fast these are
if you cant use them because they dont exist in all implementations.*
*-David*
jean-marc Mercier
2014-01-01 16:55:25 UTC
Permalink
@David. I did not read carefully enough your answer. You are talking about
hash maps, I was thinking about tree-based maps.
Post by jean-marc Mercier
@David, indeed, I am a little bit reluctant to use non standardized tools.
Anyhow, I am quite surprised by your sentence : there exists map containers
working in linear access ? Do you have any references to point out ?
Post by jean-marc Mercier
@David pairs are also basically needed to write a linear algebra modulus,
the topic of this thread. And XQUERY don't provide any efficient pair. You
can't use Marklogic map, or any other vendor map to store vectors for
performance issues (a map is really slow).
http://x-query.com/mailman/listinfo/talk
*[DAL:] *
*Just an FYI but MarkLogic also have native arrays (in addition to
maps) which are extremely efficient (they are stored internally as C
arrays).*
*But if you have to do a lot of iterating through the arrays - even
though the accessors are very efficient ,*
*the surrounding FLOWR code is still XQuery which slows things down a
bit. Maybe or maybe not enough to make them not useful for you.*
*Also the marklogic maps are not the same as XQuery 3 implemented maps,
they are a hash map under the hood and typically linear access.*
*But back to your original issue, these are vendor extensions and hence
not portable to other implementations (Until XQuery itself *
*standardizes on arrays and maps - then it is likely that vendor
extension implementations will be used to expose the standard interface).*
*If what you want is pure XQuery ... it doesnt matter how fast these are
if you cant use them because they dont exist in all implementations.*
*-David*
David Lee
2014-01-01 17:06:34 UTC
Permalink
@David. I did not read carefully enough your answer. You are talking about hash maps, I was thinking about tree-based maps.

[DAL:]
This is precisely my point. When you talk generically about "maps" I assume you refer to the generic concept not a particular implementation.
So when you say things like "maps are slow" or "maps are log(n)" I completely disagree. On principal.
It is not the generic container class that has a performance characteristic, it is the implementation which does.
This is true even when you start to add qualifiers like 'tree based' ... what kind of tree ? it could be a tree with say 10 branches per node,
that would have different characteristics then a binary tree. It could be a hash map with a linear list for overflows, or a tree for overflows or any number of variations. Even then, optimizations "under the hood" can make even the best educated guesses wildly wrong. I would estimate that in my years of performance analysis that my first guess as to either what the performance of an algorithm would be, once implemented in actual real optimized code, or looking at a complex system, what part was the major performance bottleneck is wrong more often than right.
Maybe that makes me very bad at guessing ... but in my opinion it means that reality is usually more subtle than it looks at first.

-David
David Lee
2014-01-01 16:56:44 UTC
Permalink
@jean-marc
"@David, indeed, I am a little bit reluctant to use non standardized tools. Anyhow, I am quite surprised by your sentence : there exists map containers working in linear access ? Do you have any references to point out ?
"

Just for reference, and to disclose my goal is not to push a particular vendor but to make the point that to make generalized statements about performance characteristics of generic data types without respect to the implementation is not wise.
Having done performance work professionally for decades I can attest this is a general rule - "It Depends".


But for reference here is one data point demonstrating linear access to both maps and arrays in XQuery (using a vendor extension).


Using the developer (free for development use) version of MarkLogic 7.0.1
Downloaded from here:
http://developer.marklogic.com/products

On my windows desktop (a fairly high end machine, but not unusual by any means).

map test:
using map:map object
http://docs.marklogic.com/map:map?q=map:map


I did not have the code return any results so that the timing was affected by the result size.
You are welcome to try for yourself to validate that the optimizer was not so incredible as to actually optimize away the code (or it would have taken constant time if it realized the result was always () ).

This was just quickly done in "Query Console" using the built in profiler.
Slightly better tests would be to invoke this as a stored XQuery module to avoid the interaction of the profiler and javascript console,
but you can see within reason the results are linear.


Note this also incurs a number to string conversion because map:map keys are strings.
also note in this case the map was not pre-sized, it grows on demand.

tested by incrementing $max from 100 to 10000000 in factors of 10

------- XQuery code
declare variable $map := map:map();
declare variable $max := 10000000;

for $i in 1 to $max return
map:put( $map , ""||$i , $i ),
for $i in 1 to $max
let $_ := map:get( $map , ""||$i )
return ()


Results

100 .000/.001 sec - limits of the time precision used
1000 .005 sec
10000 .047 sec
100000 .46 sec
1000000 4.8 sec
10000000 49.49 sec


Array test using json:array
http://docs.marklogic.com/json:array?q=json:array

Note in this case I do pre-size the array to $max to avoid a resize. Its a minor optimization but useful.

------ XQuery Code

declare variable $max := 10000000;
declare variable $array := json:to-array( () , $max );

for $i in 1 to $max return
$array[$i] = $i ,
for $i in 1 to $max
let $_ := $array[$i]
return ()

--------- Results
100 - under timer percision
1000 .002 sec
10000 .02 sec
100000 .23 sec
1000000 2.06 sec
10000000 20.65 sec









From: jean-marc Mercier [mailto:***@gmail.com]
Sent: Wednesday, January 01, 2014 11:05 AM
To: David Lee
Cc: Adam Retter; ***@x-query.com; Andrew Welch; ihe onwuka; Michael Sokolov
Subject: Re: [xquery-talk] Matrix Multiplication

@David, indeed, I am a little bit reluctant to use non standardized tools. Anyhow, I am quite surprised by your sentence : there exists map containers working in linear access ? Do you have any references to point out ?

2014/1/1 David Lee <***@calldei.com<mailto:***@calldei.com>>
On 31 Dec 2013 17:03, "jean-marc Mercier" <***@gmail.com<mailto:***@gmail.com>> wrote:
@David pairs are also basically needed to write a linear algebra modulus, the topic of this thread. And XQUERY don't provide any efficient pair. You can't use Marklogic map, or any other vendor map to store vectors for performance issues (a map is really slow).
http://x-query.com/mailman/listinfo/talk
[DAL:]
Just an FYI but MarkLogic also have native arrays (in addition to maps) which are extremely efficient (they are stored internally as C arrays).
But if you have to do a lot of iterating through the arrays - even though the accessors are very efficient ,
the surrounding FLOWR code is still XQuery which slows things down a bit. Maybe or maybe not enough to make them not useful for you.
Also the marklogic maps are not the same as XQuery 3 implemented maps, they are a hash map under the hood and typically linear access.

But back to your original issue, these are vendor extensions and hence not portable to other implementations (Until XQuery itself
standardizes on arrays and maps - then it is likely that vendor extension implementations will be used to expose the standard interface).
If what you want is pure XQuery ... it doesnt matter how fast these are if you cant use them because they dont exist in all implementations.


-David
jean-marc Mercier
2014-01-01 17:21:56 UTC
Permalink
@David,

Thx for the links. Indeed I thought that Marklogic DB was not free to use !
I will give it a try.
You are right: precising maps implementation is important, since it can
change a lot performances issues.

Concerning your profile tests for maps, a quick answer. If your numbers are
correct, the BaseX map seems to be 10x faster than MarkLogic one for
1000000 insertions, but behave as n log(n) (it is based over Phil Bagwell’s
hash tries AFAIK), I will try to profile MarkLogic map more precisely. This
might be due to the hash function ? Note that it is interesting to have
imperative language performances in mind. A standard C++ std::map takes
about 0.5 sec to insert 2^22 (i.e. 4 000 000) on my PC, single thread. As a
rule of thumb, a C++ std::map (that is a basic red/black tree AFAIK) should
be around 25 x faster for insertion than BaseX one.
Post by David Lee
@jean-marc
tools. Anyhow, I am quite surprised by your sentence : there exists map
containers working in linear access ? Do you have any references to point
out ?
"
Just for reference, and to disclose my goal is not to push a particular
vendor but to make the point that to make generalized statements about
performance characteristics of generic data types without respect to the
implementation is not wise.
Having done performance work professionally for decades I can attest this
is a general rule - "It Depends".
But for reference here is one data point demonstrating linear access to
both maps and arrays in XQuery (using a vendor extension).
Using the developer (free for development use) version of MarkLogic 7.0.1
http://developer.marklogic.com/products
On my windows desktop (a fairly high end machine, but not unusual by any means).
using map:map object
http://docs.marklogic.com/map:map?q=map:map
I did not have the code return any results so that the timing was affected
by the result size.
You are welcome to try for yourself to validate that the optimizer was not
so incredible as to actually optimize away the code (or it would have taken
constant time if it realized the result was always () ).
This was just quickly done in "Query Console" using the built in profiler.
Slightly better tests would be to invoke this as a stored XQuery module to
avoid the interaction of the profiler and javascript console,
but you can see within reason the results are linear.
Note this also incurs a number to string conversion because map:map keys are strings.
also note in this case the map was not pre-sized, it grows on demand.
tested by incrementing $max from 100 to 10000000 in factors of 10
------- XQuery code
declare variable $map := map:map();
declare variable $max := 10000000;
for $i in 1 to $max return
map:put( $map , ""||$i , $i ),
for $i in 1 to $max
let $_ := map:get( $map , ""||$i )
return ()
Results
100 .000/.001 sec - limits of the time precision used
1000 .005 sec
10000 .047 sec
100000 .46 sec
1000000 4.8 sec
10000000 49.49 sec
Array test using json:array
http://docs.marklogic.com/json:array?q=json:array
Note in this case I do pre-size the array to $max to avoid a resize. Its
a minor optimization but useful.
------ XQuery Code
declare variable $max := 10000000;
declare variable $array := json:to-array( () , $max );
for $i in 1 to $max return
$array[$i] = $i ,
for $i in 1 to $max
let $_ := $array[$i]
return ()
--------- Results
100 - under timer percision
1000 .002 sec
10000 .02 sec
100000 .23 sec
1000000 2.06 sec
10000000 20.65 sec
*Sent:* Wednesday, January 01, 2014 11:05 AM
*To:* David Lee
Sokolov
*Subject:* Re: [xquery-talk] Matrix Multiplication
@David, indeed, I am a little bit reluctant to use non standardized tools.
Anyhow, I am quite surprised by your sentence : there exists map containers
working in linear access ? Do you have any references to point out ?
@David pairs are also basically needed to write a linear algebra modulus,
the topic of this thread. And XQUERY don't provide any efficient pair. You
can't use Marklogic map, or any other vendor map to store vectors for
performance issues (a map is really slow).
http://x-query.com/mailman/listinfo/talk
*[DAL:] *
*Just an FYI but MarkLogic also have native arrays (in addition to maps)
which are extremely efficient (they are stored internally as C arrays).*
*But if you have to do a lot of iterating through the arrays - even
though the accessors are very efficient ,*
*the surrounding FLOWR code is still XQuery which slows things down a
bit. Maybe or maybe not enough to make them not useful for you.*
*Also the marklogic maps are not the same as XQuery 3 implemented maps,
they are a hash map under the hood and typically linear access.*
*But back to your original issue, these are vendor extensions and hence
not portable to other implementations (Until XQuery itself *
*standardizes on arrays and maps - then it is likely that vendor extension
implementations will be used to expose the standard interface).*
*If what you want is pure XQuery ... it doesnt matter how fast these are
if you cant use them because they dont exist in all implementations.*
*-David*
David Lee
2014-01-01 17:39:52 UTC
Permalink
A big performance difference you will see between XQuery and say C++ is that the interpretation of the FLOW statements can overwhelm the
performance of individual operations. Differences between XQuery implementations can be vast. This is for many reasons,
one being that it is often optimized not for calculations of this sort but for XPath expressions or database indexed retrieval.
Another issue is that different XQuery implementations have different infrastructure goals or requirements.
MarkLogic, for example, has a lot of requirements around transactional processing, performance measuring, lazy evaluations of huge data structures etc.
This adds overhead to every statement that may not be in other XQuery implementations which don't have those goals or requirements.
Some XQuery engines are designed to work 100% In memory ... that can lead to optimizations which are impossible for implementations designed to work on datasets largely out of memory ... There are just a vast amount of variables that affect things. http://www.meetup.com/baymug/events/150734492/m
In any case it would be very surprising to me if ANY XQuery implementation was as fast as hand coded C++ for things like looping over an array.
Its simply not the use case XQuery was designed for or typically optimized for. Although the same used to be true of Java.
Now java compilers have become so good that they can run at nearly "native" speeds for some things ...

Saxon XSLT EE (And I believe XQuery) has some similar technologies to generate JVM bytecode on demand which can speed things up significantly (or slow them down !).

In the end though ... if your looking for fast linear algebra libraries I suspect no matter what language you choose you will find that nothing
will beat a native implementation ... This is true even with C++ or Fortran where library vendors optimize things like matrix multiplication using
assembly or even GPUs ... Eventually someone puts the effort into wrapping those up into nice high level functions that you can call from a higher level language - but are rarely *written entirely* in that language.

Simple example from XQuery world. fn:sum() is not likely to be written in XQuery itself.





From: jean-marc Mercier [mailto:***@gmail.com]
Sent: Wednesday, January 01, 2014 12:22 PM
To: David Lee
Cc: Adam Retter; ***@x-query.com; Andrew Welch; ihe onwuka; Michael Sokolov
Subject: Re: [xquery-talk] Matrix Multiplication

@David,

Thx for the links. Indeed I thought that Marklogic DB was not free to use ! I will give it a try.
You are right: precising maps implementation is important, since it can change a lot performances issues.

Concerning your profile tests for maps, a quick answer. If your numbers are correct, the BaseX map seems to be 10x faster than MarkLogic one for 1000000 insertions, but behave as n log(n) (it is based over Phil Bagwell's hash tries AFAIK), I will try to profile MarkLogic map more precisely. This might be due to the hash function ? Note that it is interesting to have imperative language performances in mind. A standard C++ std::map takes about 0.5 sec to insert 2^22 (i.e. 4 000 000) on my PC, single thread. As a rule of thumb, a C++ std::map (that is a basic red/black tree AFAIK) should be around 25 x faster for insertion than BaseX one.




2014/1/1 David Lee <***@calldei.com<mailto:***@calldei.com>>

@jean-marc
"@David, indeed, I am a little bit reluctant to use non standardized tools. Anyhow, I am quite surprised by your sentence : there exists map containers working in linear access ? Do you have any references to point out ?
"

Just for reference, and to disclose my goal is not to push a particular vendor but to make the point that to make generalized statements about performance characteristics of generic data types without respect to the implementation is not wise.
Having done performance work professionally for decades I can attest this is a general rule - "It Depends".


But for reference here is one data point demonstrating linear access to both maps and arrays in XQuery (using a vendor extension).


Using the developer (free for development use) version of MarkLogic 7.0.1
Downloaded from here:
http://developer.marklogic.com/products

On my windows desktop (a fairly high end machine, but not unusual by any means).

map test:
using map:map object
http://docs.marklogic.com/map:map?q=map:map


I did not have the code return any results so that the timing was affected by the result size.
You are welcome to try for yourself to validate that the optimizer was not so incredible as to actually optimize away the code (or it would have taken constant time if it realized the result was always () ).

This was just quickly done in "Query Console" using the built in profiler.
Slightly better tests would be to invoke this as a stored XQuery module to avoid the interaction of the profiler and javascript console,
but you can see within reason the results are linear.


Note this also incurs a number to string conversion because map:map keys are strings.
also note in this case the map was not pre-sized, it grows on demand.

tested by incrementing $max from 100 to 10000000 in factors of 10

------- XQuery code
declare variable $map := map:map();
declare variable $max := 10000000;

for $i in 1 to $max return
map:put( $map , ""||$i , $i ),
for $i in 1 to $max
let $_ := map:get( $map , ""||$i )
return ()


Results

100 .000/.001 sec - limits of the time precision used
1000 .005 sec
10000 .047 sec
100000 .46 sec
1000000 4.8 sec
10000000 49.49 sec


Array test using json:array
http://docs.marklogic.com/json:array?q=json:array

Note in this case I do pre-size the array to $max to avoid a resize. Its a minor optimization but useful.

------ XQuery Code

declare variable $max := 10000000;
declare variable $array := json:to-array( () , $max );

for $i in 1 to $max return
$array[$i] = $i ,
for $i in 1 to $max
let $_ := $array[$i]
return ()

--------- Results
100 - under timer percision
1000 .002 sec
10000 .02 sec
100000 .23 sec
1000000 2.06 sec
10000000 20.65 sec









From: jean-marc Mercier [mailto:***@gmail.com<mailto:***@gmail.com>]
Sent: Wednesday, January 01, 2014 11:05 AM
To: David Lee
Cc: Adam Retter; ***@x-query.com<mailto:***@x-query.com>; Andrew Welch; ihe onwuka; Michael Sokolov

Subject: Re: [xquery-talk] Matrix Multiplication

@David, indeed, I am a little bit reluctant to use non standardized tools. Anyhow, I am quite surprised by your sentence : there exists map containers working in linear access ? Do you have any references to point out ?

2014/1/1 David Lee <***@calldei.com<mailto:***@calldei.com>>
On 31 Dec 2013 17:03, "jean-marc Mercier" <***@gmail.com<mailto:***@gmail.com>> wrote:
@David pairs are also basically needed to write a linear algebra modulus, the topic of this thread. And XQUERY don't provide any efficient pair. You can't use Marklogic map, or any other vendor map to store vectors for performance issues (a map is really slow).
http://x-query.com/mailman/listinfo/talk
[DAL:]
Just an FYI but MarkLogic also have native arrays (in addition to maps) which are extremely efficient (they are stored internally as C arrays).
But if you have to do a lot of iterating through the arrays - even though the accessors are very efficient ,
the surrounding FLOWR code is still XQuery which slows things down a bit. Maybe or maybe not enough to make them not useful for you.
Also the marklogic maps are not the same as XQuery 3 implemented maps, they are a hash map under the hood and typically linear access.

But back to your original issue, these are vendor extensions and hence not portable to other implementations (Until XQuery itself
standardizes on arrays and maps - then it is likely that vendor extension implementations will be used to expose the standard interface).
If what you want is pure XQuery ... it doesnt matter how fast these are if you cant use them because they dont exist in all implementations.


-David
David Lee
2014-01-01 17:43:24 UTC
Permalink
Ups sorry for pasting in that link to the MLUG ... that was an unintended mixup as I was writing 2 emails at once :(
But since the cat's out of the bag ... anyone in the area is welcome to come !


From: talk-***@x-query.com [mailto:talk-***@x-query.com] On Behalf Of David Lee
Sent: Wednesday, January 01, 2014 12:40 PM
To: jean-marc Mercier
Cc: ***@x-query.com; ihe onwuka; Michael Sokolov; Adam Retter; Andrew Welch
Subject: Re: [xquery-talk] Matrix Multiplication

A big performance difference you will see between XQuery and say C++ is that the interpretation of the FLOW statements can overwhelm the
performance of individual operations. Differences between XQuery implementations can be vast. This is for many reasons,
one being that it is often optimized not for calculations of this sort but for XPath expressions or database indexed retrieval.
Another issue is that different XQuery implementations have different infrastructure goals or requirements.
MarkLogic, for example, has a lot of requirements around transactional processing, performance measuring, lazy evaluations of huge data structures etc.
This adds overhead to every statement that may not be in other XQuery implementations which don't have those goals or requirements.
Some XQuery engines are designed to work 100% In memory ... that can lead to optimizations which are impossible for implementations designed to work on datasets largely out of memory ... There are just a vast amount of variables that affect things. http://www.meetup.com/baymug/events/150734492/m
In any case it would be very surprising to me if ANY XQuery implementation was as fast as hand coded C++ for things like looping over an array.
Its simply not the use case XQuery was designed for or typically optimized for. Although the same used to be true of Java.
Now java compilers have become so good that they can run at nearly "native" speeds for some things ...

Saxon XSLT EE (And I believe XQuery) has some similar technologies to generate JVM bytecode on demand which can speed things up significantly (or slow them down !).

In the end though ... if your looking for fast linear algebra libraries I suspect no matter what language you choose you will find that nothing
will beat a native implementation ... This is true even with C++ or Fortran where library vendors optimize things like matrix multiplication using
assembly or even GPUs ... Eventually someone puts the effort into wrapping those up into nice high level functions that you can call from a higher level language - but are rarely *written entirely* in that language.

Simple example from XQuery world. fn:sum() is not likely to be written in XQuery itself.





From: jean-marc Mercier [mailto:***@gmail.com]
Sent: Wednesday, January 01, 2014 12:22 PM
To: David Lee
Cc: Adam Retter; ***@x-query.com<mailto:***@x-query.com>; Andrew Welch; ihe onwuka; Michael Sokolov
Subject: Re: [xquery-talk] Matrix Multiplication

@David,

Thx for the links. Indeed I thought that Marklogic DB was not free to use ! I will give it a try.
You are right: precising maps implementation is important, since it can change a lot performances issues.

Concerning your profile tests for maps, a quick answer. If your numbers are correct, the BaseX map seems to be 10x faster than MarkLogic one for 1000000 insertions, but behave as n log(n) (it is based over Phil Bagwell's hash tries AFAIK), I will try to profile MarkLogic map more precisely. This might be due to the hash function ? Note that it is interesting to have imperative language performances in mind. A standard C++ std::map takes about 0.5 sec to insert 2^22 (i.e. 4 000 000) on my PC, single thread. As a rule of thumb, a C++ std::map (that is a basic red/black tree AFAIK) should be around 25 x faster for insertion than BaseX one.




2014/1/1 David Lee <***@calldei.com<mailto:***@calldei.com>>

@jean-marc
"@David, indeed, I am a little bit reluctant to use non standardized tools. Anyhow, I am quite surprised by your sentence : there exists map containers working in linear access ? Do you have any references to point out ?
"

Just for reference, and to disclose my goal is not to push a particular vendor but to make the point that to make generalized statements about performance characteristics of generic data types without respect to the implementation is not wise.
Having done performance work professionally for decades I can attest this is a general rule - "It Depends".


But for reference here is one data point demonstrating linear access to both maps and arrays in XQuery (using a vendor extension).


Using the developer (free for development use) version of MarkLogic 7.0.1
Downloaded from here:
http://developer.marklogic.com/products

On my windows desktop (a fairly high end machine, but not unusual by any means).

map test:
using map:map object
http://docs.marklogic.com/map:map?q=map:map


I did not have the code return any results so that the timing was affected by the result size.
You are welcome to try for yourself to validate that the optimizer was not so incredible as to actually optimize away the code (or it would have taken constant time if it realized the result was always () ).

This was just quickly done in "Query Console" using the built in profiler.
Slightly better tests would be to invoke this as a stored XQuery module to avoid the interaction of the profiler and javascript console,
but you can see within reason the results are linear.


Note this also incurs a number to string conversion because map:map keys are strings.
also note in this case the map was not pre-sized, it grows on demand.

tested by incrementing $max from 100 to 10000000 in factors of 10

------- XQuery code
declare variable $map := map:map();
declare variable $max := 10000000;

for $i in 1 to $max return
map:put( $map , ""||$i , $i ),
for $i in 1 to $max
let $_ := map:get( $map , ""||$i )
return ()


Results

100 .000/.001 sec - limits of the time precision used
1000 .005 sec
10000 .047 sec
100000 .46 sec
1000000 4.8 sec
10000000 49.49 sec


Array test using json:array
http://docs.marklogic.com/json:array?q=json:array

Note in this case I do pre-size the array to $max to avoid a resize. Its a minor optimization but useful.

------ XQuery Code

declare variable $max := 10000000;
declare variable $array := json:to-array( () , $max );

for $i in 1 to $max return
$array[$i] = $i ,
for $i in 1 to $max
let $_ := $array[$i]
return ()

--------- Results
100 - under timer percision
1000 .002 sec
10000 .02 sec
100000 .23 sec
1000000 2.06 sec
10000000 20.65 sec









From: jean-marc Mercier [mailto:***@gmail.com<mailto:***@gmail.com>]
Sent: Wednesday, January 01, 2014 11:05 AM
To: David Lee
Cc: Adam Retter; ***@x-query.com<mailto:***@x-query.com>; Andrew Welch; ihe onwuka; Michael Sokolov

Subject: Re: [xquery-talk] Matrix Multiplication

@David, indeed, I am a little bit reluctant to use non standardized tools. Anyhow, I am quite surprised by your sentence : there exists map containers working in linear access ? Do you have any references to point out ?

2014/1/1 David Lee <***@calldei.com<mailto:***@calldei.com>>
On 31 Dec 2013 17:03, "jean-marc Mercier" <***@gmail.com<mailto:***@gmail.com>> wrote:
@David pairs are also basically needed to write a linear algebra modulus, the topic of this thread. And XQUERY don't provide any efficient pair. You can't use Marklogic map, or any other vendor map to store vectors for performance issues (a map is really slow).
http://x-query.com/mailman/listinfo/talk
[DAL:]
Just an FYI but MarkLogic also have native arrays (in addition to maps) which are extremely efficient (they are stored internally as C arrays).
But if you have to do a lot of iterating through the arrays - even though the accessors are very efficient ,
the surrounding FLOWR code is still XQuery which slows things down a bit. Maybe or maybe not enough to make them not useful for you.
Also the marklogic maps are not the same as XQuery 3 implemented maps, they are a hash map under the hood and typically linear access.

But back to your original issue, these are vendor extensions and hence not portable to other implementations (Until XQuery itself
standardizes on arrays and maps - then it is likely that vendor extension implementations will be used to expose the standard interface).
If what you want is pure XQuery ... it doesnt matter how fast these are if you cant use them because they dont exist in all implementations.


-David
Adam Retter
2014-01-01 18:26:21 UTC
Permalink
Post by David Lee
Now java compilers have become so good that they can run at nearly
"native" speeds for some things ...
Whilst we are talking about avoiding generalisations. I would point out
that the above statement is quite outdated, with modern JVMs and well
optimised code, you can often achieve performance that is better than the
equivalent C or C++ code. Just do some googling...
David Lee
2014-01-01 18:51:07 UTC
Permalink
Touché' good call.

I was about to mention that java could be faster but hadn't kept up with the facts enough to make a generalized statement :)

I just stumbled on this

http://dev.saxonica.com/blog/oneil/2013/12/saxonc---saxon-for-the-cc-and-php-platforms.html

Which proves Java is 2x faster than the same code in C++ :)(***)
And doing XSLT at that !

(***) in case the ":)" is not obvious enough, this proves (or disproves) nothing of the sort ... but it is still a fascinating write-up and definitely in-context with the thread.
Good work Saxonica !!!!




From: Adam Retter [mailto:***@googlemail.com]
Sent: Wednesday, January 01, 2014 1:26 PM
To: David Lee
Cc: Andrew Welch; ***@x-query.com; ihe onwuka; jean-marc Mercier; Michael Sokolov
Subject: RE: [xquery-talk] Matrix Multiplication
Post by David Lee
Now java compilers have become so good that they can run at nearly "native" speeds for some things ...
Whilst we are talking about avoiding generalisations. I would point out that the above statement is quite outdated, with modern JVMs and well optimised code, you can often achieve performance that is better than the equivalent C or C++ code. Just do some googling...
Ihe Onwuka
2013-12-31 15:46:26 UTC
Permalink
I have have a linear algebra library in Python, but the incentive for
using a query language is that it's declarative and you get optimisation
for free. The only hassle with SQL is that it forces you to declare a
schema but even so it looks like the best option - XQuery requires too much
courage.


On Tue, Dec 31, 2013 at 2:27 PM, jean-marc Mercier <
Post by jean-marc Mercier
As far as I understand, you want to write a linear algebra module using
XQUERY ?
If so, I opened a thread some months ago about this idea. My opinion today
is that this is a false good idea at present time.
1) XQUERY would be really good for writing concise, efficient linear
algebra modulus.
2) However, I strongly recommend to wait a little bit for starting coding
: the current version of XQUERY (3.0) suffers from performance issues. A
linear algebra modulus written in XQUERY is expected to have performances
performances 1000 X slower than its corresponding C++ or JAVA (you can
measure it precisely). Any mathematician linear algebra modulus would
probably trashed your modulus after the first test.
Hope this helps
Post by Ihe Onwuka
Assuming a sparse representation it is about 4 lines of SQL. This is
known not least because you can read enough articles and papers that
discuss it and it optimises well. The obvious google search does not reveal
any corresponding XQuery discussion, neither does it appear to have
surfaced on this or the eXist mailing list (allowing for my deficient
search skills). For something so "trivial" I thought that was rather
strange. Hence I thought it would be prudent to ask before naively
embarking on a 600k X 40k matrix multiplication.
Post by Andrew Welch
It should be pretty trivial...
Post by Ihe Onwuka
Has anybody tried this in XQuery or if I am so foolish (not yet but
give me time) would I be the courageous <culturalReference>
http://youtu.be/ik8JT2S-kBE early
adopter.
Post by Ihe Onwuka
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Continue reading on narkive:
Loading...