daniela florescu
2015-05-29 05:33:50 UTC
Another message that I sent this morning, and it didn't make it though.....until now.
Thanks Marklogic for opening up the blockade.
I guess the MarkLogic lawyers needed a little bit of time to scratch their heads about what to do.....(and BTW,
silencing me isn't a solution... I lived in a communist country for 22 years... they've tried that ... didn't work)
But the following message is a serious discussion about the state of affairs in the query languages universe for NoSQL
databases.
> On May 28, 2015, at 2:20 PM, daniela florescu <***@me.com> wrote:
>
> The NoSQl industry is extremely successful, used everywhere, and considered by many the child prodigee of the database industry.
>
>
> They are proud of themselves because they satisfy user needs, aka: they store data:
> (a) which is not in 1st normal form (aka nested, pre-aggregated)
> (b) without schema
>
> âŠto the practical benefit of:
> (a) the application getting the data out of the database exactly as the application needs it, and not
> altered through a normalization phase.
> (b) the lack of fixed schema helps with data flexibility⊠things change extremely quickly inside an application
> those days (fields being added, deleted, changed, etc)
>
>
> So far so good, and I think until here they are all right.
>
> [[ One may think that this looks a little bit like ⊠XML, but hey, they donât like XML. Fine.]]
>
> The problems comes when they try to QUERY this data.
>
>
> The NoSQL industry is re-inventing the wheel from scratch, and in a very chaotic and ad-hoc manner.
>
> Just look at the sad state of affairs in terms of query languages and their semantics.
>
> I am just look at the ones who claim that they can store nested and schema-less data (JSON-like, or XML-lIke)
>
> (1) MongoDB
> http://docs.mongodb.org/manual/tutorial/query-documents/ <http://docs.mongodb.org/manual/tutorial/query-documents/>
>
> Note: pure JSON. Couldnât find a simple sort, for example. Etc. Etc.
>
> (2) Cassandra/DataStax
> http://www.datastax.com/wp-content/uploads/2013/03/cql_3_ref_card.pdf <http://www.datastax.com/wp-content/uploads/2013/03/cql_3_ref_card.pdf>
>
> Nore: not even an OR, or a NOT. And does it mean to sort on schema-less data ?
>
> (3) Spark/DataBricks
> https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html <https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html>
>
> Note: sounds more like an import/export facility⊠but they call it a JSON Query language
>
> (4) Elastic Search
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html <https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html>
>
> Note: very sophisticated full text,but not structured search of any serious kind. Just some simple aggregates (sum, etc)
>
>
> (5) Mulesoft
> https://www.mulesoft.com/press-center/new-release-june-2015?utm_source=linkedin&sthash.axJqiSBn.mjjo <https://www.mulesoft.com/press-center/new-release-june-2015?utm_source=linkedin&sthash.axJqiSBn.mjjo>
>
> Note: not only they seem to have their own JSON query language, but even their own XML query language, it seems. couldnât find more details.
>
> (6) Hive
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF>
>
> Note: multiple languages (Xpath, some json, some SQL, glued together somehow chaotically)
>
> I can fill in tons of pages with YET-ANOTHER-LANGUGAGE-LIKE-THIS.
>
> (7) MarkLogic
>
> https://docs.marklogic.com/8.0/guide/app-dev/json <https://docs.marklogic.com/8.0/guide/app-dev/json>
>
>
>
> ==============
>
> Now I can spot several mistake here:
>
> 1. None of those query language has a clearly designed, mathematical data model. in the absence of such a data model, that describes the input, the output
> and the intermediate results of a query, how can we define a clean semantics ?
>
> 2. All of them have a hacky semantics â âletâs run it and weâll se what the result isâ kind of thing. The semantics in most cost corner cases â and by definition
> semi-structured data is ONLY corner cases -- is not defined.
>
> 3. Some try to piggy back on the SQL semantics, ignoring the fact that the SQL was designed to work on relations, and JSON (or in general, nested data)
> has nothing to do with relations. SQL semantics cannot be âportedââŠ.just because we reuse the same keywords.
>
> 4. None attempted to define a type system (even a basic one for atomic types like dates, and arithmetics on them..) and a schema language.
>
> ==============
>
>
> Now maybe itâs clear why I am so sad that the XQuery community, instead of trying to help the younger and naive NoSQL community, which still believes that
> SQL is âgood enoughâ, and using the SELECT-FROM-WHERE keywords is the magic bullet to define the semantics of any kind of query language, the XQuery community
> is still looking at their own navel, and marveling, like the well known CEO: "we can handle flexible data" !!!
>
> Just compare those languages I listed above with the work that has been done in the past 16 years in XQuery, and the correctness and the complexity of the result
> vs, the hacky solutions above.
>
> P.S. And yes, that work from XQuery was used 100% in the design of JSONiq, which was designed with the dual goal in mind:
> (a) reuse 100% of the experience of design and implementation of XQuery and
> (b) provide a query language that is synactically and semantically acceptable for the JSON community.
>
> if we succeeded or not, thatâs another story, but I am not aware of any other solution that even comes CLOSE to that goal.
>
>
> Best regards
> Dana
>
>
>
>
>
>
>
>
>
>
>
Thanks Marklogic for opening up the blockade.
I guess the MarkLogic lawyers needed a little bit of time to scratch their heads about what to do.....(and BTW,
silencing me isn't a solution... I lived in a communist country for 22 years... they've tried that ... didn't work)
But the following message is a serious discussion about the state of affairs in the query languages universe for NoSQL
databases.
> On May 28, 2015, at 2:20 PM, daniela florescu <***@me.com> wrote:
>
> The NoSQl industry is extremely successful, used everywhere, and considered by many the child prodigee of the database industry.
>
>
> They are proud of themselves because they satisfy user needs, aka: they store data:
> (a) which is not in 1st normal form (aka nested, pre-aggregated)
> (b) without schema
>
> âŠto the practical benefit of:
> (a) the application getting the data out of the database exactly as the application needs it, and not
> altered through a normalization phase.
> (b) the lack of fixed schema helps with data flexibility⊠things change extremely quickly inside an application
> those days (fields being added, deleted, changed, etc)
>
>
> So far so good, and I think until here they are all right.
>
> [[ One may think that this looks a little bit like ⊠XML, but hey, they donât like XML. Fine.]]
>
> The problems comes when they try to QUERY this data.
>
>
> The NoSQL industry is re-inventing the wheel from scratch, and in a very chaotic and ad-hoc manner.
>
> Just look at the sad state of affairs in terms of query languages and their semantics.
>
> I am just look at the ones who claim that they can store nested and schema-less data (JSON-like, or XML-lIke)
>
> (1) MongoDB
> http://docs.mongodb.org/manual/tutorial/query-documents/ <http://docs.mongodb.org/manual/tutorial/query-documents/>
>
> Note: pure JSON. Couldnât find a simple sort, for example. Etc. Etc.
>
> (2) Cassandra/DataStax
> http://www.datastax.com/wp-content/uploads/2013/03/cql_3_ref_card.pdf <http://www.datastax.com/wp-content/uploads/2013/03/cql_3_ref_card.pdf>
>
> Nore: not even an OR, or a NOT. And does it mean to sort on schema-less data ?
>
> (3) Spark/DataBricks
> https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html <https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html>
>
> Note: sounds more like an import/export facility⊠but they call it a JSON Query language
>
> (4) Elastic Search
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html <https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html>
>
> Note: very sophisticated full text,but not structured search of any serious kind. Just some simple aggregates (sum, etc)
>
>
> (5) Mulesoft
> https://www.mulesoft.com/press-center/new-release-june-2015?utm_source=linkedin&sthash.axJqiSBn.mjjo <https://www.mulesoft.com/press-center/new-release-june-2015?utm_source=linkedin&sthash.axJqiSBn.mjjo>
>
> Note: not only they seem to have their own JSON query language, but even their own XML query language, it seems. couldnât find more details.
>
> (6) Hive
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF>
>
> Note: multiple languages (Xpath, some json, some SQL, glued together somehow chaotically)
>
> I can fill in tons of pages with YET-ANOTHER-LANGUGAGE-LIKE-THIS.
>
> (7) MarkLogic
>
> https://docs.marklogic.com/8.0/guide/app-dev/json <https://docs.marklogic.com/8.0/guide/app-dev/json>
>
>
>
> ==============
>
> Now I can spot several mistake here:
>
> 1. None of those query language has a clearly designed, mathematical data model. in the absence of such a data model, that describes the input, the output
> and the intermediate results of a query, how can we define a clean semantics ?
>
> 2. All of them have a hacky semantics â âletâs run it and weâll se what the result isâ kind of thing. The semantics in most cost corner cases â and by definition
> semi-structured data is ONLY corner cases -- is not defined.
>
> 3. Some try to piggy back on the SQL semantics, ignoring the fact that the SQL was designed to work on relations, and JSON (or in general, nested data)
> has nothing to do with relations. SQL semantics cannot be âportedââŠ.just because we reuse the same keywords.
>
> 4. None attempted to define a type system (even a basic one for atomic types like dates, and arithmetics on them..) and a schema language.
>
> ==============
>
>
> Now maybe itâs clear why I am so sad that the XQuery community, instead of trying to help the younger and naive NoSQL community, which still believes that
> SQL is âgood enoughâ, and using the SELECT-FROM-WHERE keywords is the magic bullet to define the semantics of any kind of query language, the XQuery community
> is still looking at their own navel, and marveling, like the well known CEO: "we can handle flexible data" !!!
>
> Just compare those languages I listed above with the work that has been done in the past 16 years in XQuery, and the correctness and the complexity of the result
> vs, the hacky solutions above.
>
> P.S. And yes, that work from XQuery was used 100% in the design of JSONiq, which was designed with the dual goal in mind:
> (a) reuse 100% of the experience of design and implementation of XQuery and
> (b) provide a query language that is synactically and semantically acceptable for the JSON community.
>
> if we succeeded or not, thatâs another story, but I am not aware of any other solution that even comes CLOSE to that goal.
>
>
> Best regards
> Dana
>
>
>
>
>
>
>
>
>
>
>