[xquery-talk] the sad state of query languages for semi-structured data in the NoSQL industry

Discussion:

daniela florescu

2015-05-29 05:33:50 UTC

Another message that I sent this morning, and it didn't make it though.....until now.

Thanks Marklogic for opening up the blockade.

I guess the MarkLogic lawyers needed a little bit of time to scratch their heads about what to do.....(and BTW,
silencing me isn't a solution... I lived in a communist country for 22 years... they've tried that ... didn't work)

But the following message is a serious discussion about the state of affairs in the query languages universe for NoSQL
databases.

> On May 28, 2015, at 2:20 PM, daniela florescu <***@me.com> wrote:
>
> The NoSQl industry is extremely successful, used everywhere, and considered by many the child prodigee of the database industry.
>
>
> They are proud of themselves because they satisfy user needs, aka: they store data:
> (a) which is not in 1st normal form (aka nested, pre-aggregated)
> (b) without schema
>
> âŠto the practical benefit of:
> (a) the application getting the data out of the database exactly as the application needs it, and not
> altered through a normalization phase.
> (b) the lack of fixed schema helps with data flexibilityâŠ things change extremely quickly inside an application
> those days (fields being added, deleted, changed, etc)
>
>
> So far so good, and I think until here they are all right.
>
> [[ One may think that this looks a little bit like âŠ XML, but hey, they donât like XML. Fine.]]
>
> The problems comes when they try to QUERY this data.
>
>
> The NoSQL industry is re-inventing the wheel from scratch, and in a very chaotic and ad-hoc manner.
>
> Just look at the sad state of affairs in terms of query languages and their semantics.
>
> I am just look at the ones who claim that they can store nested and schema-less data (JSON-like, or XML-lIke)
>
> (1) MongoDB
> http://docs.mongodb.org/manual/tutorial/query-documents/ <http://docs.mongodb.org/manual/tutorial/query-documents/>
>
> Note: pure JSON. Couldnât find a simple sort, for example. Etc. Etc.
>
> (2) Cassandra/DataStax
> http://www.datastax.com/wp-content/uploads/2013/03/cql_3_ref_card.pdf <http://www.datastax.com/wp-content/uploads/2013/03/cql_3_ref_card.pdf>
>
> Nore: not even an OR, or a NOT. And does it mean to sort on schema-less data ?
>
> (3) Spark/DataBricks
> https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html <https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html>
>
> Note: sounds more like an import/export facilityâŠ but they call it a JSON Query language
>
> (4) Elastic Search
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html <https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html>
>
> Note: very sophisticated full text,but not structured search of any serious kind. Just some simple aggregates (sum, etc)
>
>
> (5) Mulesoft
> https://www.mulesoft.com/press-center/new-release-june-2015?utm_source=linkedin&sthash.axJqiSBn.mjjo <https://www.mulesoft.com/press-center/new-release-june-2015?utm_source=linkedin&sthash.axJqiSBn.mjjo>
>
> Note: not only they seem to have their own JSON query language, but even their own XML query language, it seems. couldnât find more details.
>
> (6) Hive
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF>
>
> Note: multiple languages (Xpath, some json, some SQL, glued together somehow chaotically)
>
> I can fill in tons of pages with YET-ANOTHER-LANGUGAGE-LIKE-THIS.
>
> (7) MarkLogic
>
> https://docs.marklogic.com/8.0/guide/app-dev/json <https://docs.marklogic.com/8.0/guide/app-dev/json>
>
>
>
> ==============
>
> Now I can spot several mistake here:
>
> 1. None of those query language has a clearly designed, mathematical data model. in the absence of such a data model, that describes the input, the output
> and the intermediate results of a query, how can we define a clean semantics ?
>
> 2. All of them have a hacky semantics â âletâs run it and weâll se what the result isâ kind of thing. The semantics in most cost corner cases â and by definition
> semi-structured data is ONLY corner cases -- is not defined.
>
> 3. Some try to piggy back on the SQL semantics, ignoring the fact that the SQL was designed to work on relations, and JSON (or in general, nested data)
> has nothing to do with relations. SQL semantics cannot be âportedââŠ.just because we reuse the same keywords.
>
> 4. None attempted to define a type system (even a basic one for atomic types like dates, and arithmetics on them..) and a schema language.
>
> ==============
>
>
> Now maybe itâs clear why I am so sad that the XQuery community, instead of trying to help the younger and naive NoSQL community, which still believes that
> SQL is âgood enoughâ, and using the SELECT-FROM-WHERE keywords is the magic bullet to define the semantics of any kind of query language, the XQuery community
> is still looking at their own navel, and marveling, like the well known CEO: "we can handle flexible data" !!!
>
> Just compare those languages I listed above with the work that has been done in the past 16 years in XQuery, and the correctness and the complexity of the result
> vs, the hacky solutions above.
>
> P.S. And yes, that work from XQuery was used 100% in the design of JSONiq, which was designed with the dual goal in mind:
> (a) reuse 100% of the experience of design and implementation of XQuery and
> (b) provide a query language that is synactically and semantically acceptable for the JSON community.
>
> if we succeeded or not, thatâs another story, but I am not aware of any other solution that even comes CLOSE to that goal.
>
>
> Best regards
> Dana
>
>
>
>
>
>
>
>
>
>
>

Misztur, Chris

2015-05-29 12:08:07 UTC

Permalink

thanks

________________________________

The contents of this message may be privileged and confidential. Therefore, if this message has been received in error, please delete it without reading it. Your receipt of this message is not intended to waive any applicable privilege. Please do not disseminate this message without the permission of the author.

Please consider the environment before printing this e-mail

_______________________________________________
***@x-query.com
http://x-query.com/mailman/listinfo/talk

Ihe Onwuka

2015-05-29 13:12:27 UTC

Permalink

On Thu, May 28, 2015 at 5:20 PM, daniela florescu <***@me.com> wrote:

> The NoSQl industry is extremely successful, used everywhere, and
> considered by many the child prodigee of the database industry.
>
>
I could have sworn it is the unacknowledged hip but bastard grandchild of
the network and hierarchical databases of the 60's and 70's.... so correct
me where I am wrong in what follows.

>
> They are proud of themselves because they satisfy user needs, aka: they
> store data:
> (a) which is not in 1st normal form (aka nested, pre-aggregated)
> (b) without schema
>
> âŠto the practical benefit of:
> (a) the application getting the data out of the database exactly as the
> application needs it, and not
> altered through a normalization phase.
>

Which can give you blazing fast performance IFF. But to take an example
from my movie project. We have stored movie reviews by critic. You pull up
a page for the critic and get all the movie reviews he has ever written.
Then the client suddenly turns around (as mine did) and says I want to pull
up the movie and get all the reviews the different critics have written.
That query isn't going to be fast, and if you are not working with a proper
query language it might not be straightforward to write. So not only do you
not get a free lunch on the performance, you mind end up with a double
whammy.

In that sense nothing has changed from db's of 60's and 70's .

Enter relational and you had (after normalisation) a database design that
was neutral to the queries that were to be run as there was no nesting. In
addition you got a proper relational query language and something very
important - query optimisation (in theory at least) for free.

> (b) the lack of fixed schema helps with data flexibilityâŠ things change
> extremely quickly inside an application
> those days (fields being added, deleted, changed, etc)
>
>
How much data independence does that afford you.

>
> So far so good, and I think until here they are all right.
>
> [[ One may think that this looks a little bit like âŠ XML, but hey, they
> donât like XML. Fine.]]
>
> The problems comes when they try to QUERY this data.
>
>
> The NoSQL industry is re-inventing the wheel from scratch, and in a very
> chaotic and ad-hoc manner.
>
> Just look at the sad state of affairs in terms of query languages and
> their semantics.
>
> <snipped/>
>
> ==============
>
> Now I can spot several mistake here:
>
> 1. None of those query language has a clearly designed, mathematical data
> model. in the absence of such a data model, that describes the input, the
> output
> and the intermediate results of a query, how can we define a clean
> semantics ?
>
> 2. All of them have a hacky semantics â âletâs run it and weâll se what
> the result isâ kind of thing. The semantics in most cost corner cases â and
> by definition
> semi-structured data is ONLY corner cases -- is not defined.
>
> 3. Some try to piggy back on the SQL semantics, ignoring the fact that the
> SQL was designed to work on relations, and JSON (or in general, nested
> data)
> has nothing to do with relations. SQL semantics cannot be âportedââŠ.just
> because we reuse the same keywords.
>
>
A big reason why people in Analytics who know what they are talking about
are keen to use SQL is because you get query optimisation for free.

> 4. None attempted to define a type system (even a basic one for atomic
> types like dates, and arithmetics on them..) and a schema language.
>
> Now maybe itâs clear why I am so sad that the XQuery community, instead of
> trying to help the younger and naive NoSQL community, which still believes
> that
> SQL is âgood enoughâ, and using the SELECT-FROM-WHERE keywords is the
> magic bullet to define the semantics of any kind of query language, the
> XQuery community
> is still looking at their own navel, and marveling, like the well known
> CEO: "we can handle flexible data" !!!
>
> Just compare those languages I listed above with the work that has been
> done in the past 16 years in XQuery, and the correctness and the complexity
> of the result
> vs, the hacky solutions above.
>
> P.S. And yes, that work from XQuery was used 100% in the design of JSONiq,
> which was designed with the dual goal in mind:
> (a) reuse 100% of the experience of design and implementation of XQuery and
> (b) provide a query language that is synactically and semantically
> acceptable for the JSON community.
>
>
It's called Javascript. Also known as Python.

> if we succeeded or not, thatâs another story, but I am not aware of any
> other solution that even comes CLOSE to that goal.
>
>
They don't share that goal.

daniela florescu

2015-05-29 14:47:09 UTC

Permalink

>
> P.S. And yes, that work from XQuery was used 100% in the design of JSONiq, which was designed with the dual goal in mind:
> (a) reuse 100% of the experience of design and implementation of XQuery and
> (b) provide a query language that is synactically and semantically acceptable for the JSON community.
>
>
> It's called Javascript. Also known as Python.
>

A query language like âŠâŠ hum âŠ.. Javascript or Phyton !???

In this email I am only talking about query languages, Ihe. See the definition here:
http://en.wikipedia.org/wiki/Query_language <http://en.wikipedia.org/wiki/Query_language>

Best regards
Dana

Ihe Onwuka

2015-05-29 14:53:24 UTC

Permalink

Daniela, you forgot to switch your irony detecting gene on.

On Fri, May 29, 2015 at 10:47 AM, daniela florescu <***@me.com> wrote:

>
>> P.S. And yes, that work from XQuery was used 100% in the design of
>> JSONiq, which was designed with the dual goal in mind:
>> (a) reuse 100% of the experience of design and implementation of XQuery
>> and
>> (b) provide a query language that is synactically and semantically
>> acceptable for the JSON community.
>>
>>
> It's called Javascript. Also known as Python.
>
>
>
> A query language like âŠâŠ hum âŠ.. Javascript or Phyton !???
>
> In this email I am only talking about query languages, Ihe. See the
> definition here:
> http://en.wikipedia.org/wiki/Query_language
>
> Best regards
> Dana
>

daniela florescu

2015-05-29 15:02:46 UTC

Permalink

The humor gene is here. But recent events made my sense of humor become a little dry.

So, if you want a technical discussion, letâs have technical discussion.

Best regards
Dana

> On May 29, 2015, at 7:53 AM, Ihe Onwuka <***@gmail.com> wrote:
>
> Daniela, you forgot to switch your irony detecting gene on.
>
>
>
>
>
>
> On Fri, May 29, 2015 at 10:47 AM, daniela florescu <***@me.com <mailto:***@me.com>> wrote:
>>
>> P.S. And yes, that work from XQuery was used 100% in the design of JSONiq, which was designed with the dual goal in mind:
>> (a) reuse 100% of the experience of design and implementation of XQuery and
>> (b) provide a query language that is synactically and semantically acceptable for the JSON community.
>>
>>
>> It's called Javascript. Also known as Python.
>>
>
> A query language like âŠâŠ hum âŠ.. Javascript or Phyton !???
>
> In this email I am only talking about query languages, Ihe. See the definition here:
> http://en.wikipedia.org/wiki/Query_language <http://en.wikipedia.org/wiki/Query_language>
>
> Best regards
> Dana
>
> _______________________________________________
> ***@x-query.com
> http://x-query.com/mailman/listinfo/talk

Ihe Onwuka

2015-05-29 16:14:38 UTC

Permalink

On Fri, May 29, 2015 at 11:02 AM, daniela florescu <***@me.com> wrote:

> The humor gene is here. But recent events made my sense of humor become a
> little dry.
>
> So, if you want a technical discussion, letâs have technical discussion.
>
>
For that to be really interesting you need to find somebody who disagrees
with you.