Discussion:
[xquery-talk] the stupidity of using SQL as a query language for JSON
daniela florescu
2015-06-01 16:57:23 UTC
Permalink
Every day we can see one more of those in the industry:

http://drill.apache.org

They are usually received with extreme enthusiasm by the cohorts of "database developers without any
engineering background", as Ihe calls them.

It makes me cry every day.

People are unbelievable gullible and can eat any kid of marketing stuff, without using their brains. It’s amazing.

I mean… using Javascript to process server side JSON data is just performance and productivity-wise stupid.

But at least it is doable, and … correct. Gives some correct answers (20 hours or so later..).

Using SQL to process JSON is simply NOT POSSIBLE.

It’s like saying that we use C++, but we removed the objects and we replaced them with COBOL structures.

Hey, but we are using C++ !!!!!!

How can we stop this madness !???

One should REALLY start for JSON the SAME discussions we had in the XML world all over again, 20 years later !???

Distressed at the lack of thinking in the IT population,
Dana







_______________________________________________
***@x-query.com
http://x-query.com/mail
Michael Kay
2015-06-01 17:51:24 UTC
Permalink
Post by daniela florescu
Using SQL to process JSON is simply NOT POSSIBLE.
Proof?

My Ph.D. thesis in 1975 showed that you can use SQL to query hierarchic data. With the emphasis on “query”. The problems start with update. So long as you restrict yourself to query, you can map almost any data model onto any other.

Though the fact that you can do it doesn’t mean that it’s a good idea, of course. In those days the reason for doing it was that SQL was the only declarative query language in town, and that’s no longer the case.

Michael Kay
Saxonica


_______________________________________________
***@x-query.com
http://x-query
Ihe Onwuka
2015-06-01 18:08:37 UTC
Permalink
Then I have some questions.

So what happens to the closure property.

Should I be allowed to join a JSON array with an object. Why not, or if so
what type of thing will I get back and what will happen when if I try to
query it (I will be allowed to query it won't I).

If I ask to order by a field and there is no schema to tell me it's type
what is SQL going to do.

Talking of ordering how do I get the 5th array item or the 5th object when
SQL only deals in unordered sets.

What would be the result of union of two objects and how can I determine
whether such a union should be allowed at all when there is no schema?
Post by Michael Kay
Post by daniela florescu
Using SQL to process JSON is simply NOT POSSIBLE.
Proof?
My Ph.D. thesis in 1975 showed that you can use SQL to query hierarchic
data. With the emphasis on “query”. The problems start with update. So long
as you restrict yourself to query, you can map almost any data model onto
any other.
Though the fact that you can do it doesn’t mean that it’s a good idea, of
course. In those days the reason for doing it was that SQL was the only
declarative query language in town, and that’s no longer the case.
Michael Kay
Saxonica
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Michael Kay
2015-06-01 19:22:27 UTC
Permalink
Post by Ihe Onwuka
Then I have some questions.
So what happens to the closure property.
Well, if you’re going to apply SQL to JSON (say), then the first thing you have to do is define a mapping from JSON to tables. That’s not difficult to do.

If by the “closure property” you want the result of any SQL query to be the representation of some JSON structure, then you’re not going to achieve that. That’s essentially the same as the update problem.
Post by Ihe Onwuka
Should I be allowed to join a JSON array with an object.
No, you don’t join arrays with objects (maps). You join the table representation of an array with the table representation of a map. And what you get back is a table, which of course you can query.
Post by Ihe Onwuka
Why not, or if so what type of thing will I get back and what will happen when if I try to query it (I will be allowed to query it won't I).
If I ask to order by a field and there is no schema to tell me it's type what is SQL going to do.
JSON types are based on the syntax of the instance, not on any schema. If it looks like a number then it is a number. 4=4.0 is true, “4”=“4.0” is false.
Post by Ihe Onwuka
Talking of ordering how do I get the 5th array item or the 5th object when SQL only deals in unordered sets.
SQL doesn’t deal only in unordered sets. It deals in tables. It has an “order by” operator. You’re probably thinking of the relational model, which is not the same as SQL.
Post by Ihe Onwuka
What would be the result of union of two objects and how can I determine whether such a union should be allowed at all when there is no schema?
Again, it’s the union of two tables which are the representations of objects/maps, and the details depend on how you do the mapping.

Michael Kay
Saxonica
Post by Ihe Onwuka
_______________________________________________
http://x-query.com/mailman/listinfo/talk <http://x-query.com/mailman/listinfo/talk>
Ihe Onwuka
2015-06-01 19:55:06 UTC
Permalink
Post by Ihe Onwuka
Then I have some questions.
So what happens to the closure property.
Well, if you’re going to apply SQL to JSON (say), then the first thing you
have to do is define a mapping from JSON to tables. That’s not difficult to
do.
If by the “closure property” you want the result of any SQL query to be
the representation of some JSON structure, then you’re not going to achieve
that. That’s essentially the same as the update problem.
So pretty much no subqueries (ok somebody is going to say that any query
entailing a subquery can be rewritten without one) but AFAIC the language
you are talking about isn't really SQL then.

Should I be allowed to join a JSON array with an object.
Post by Ihe Onwuka
No, you don’t join arrays with objects (maps). You join the table
representation of an array with the table representation of a map. And what
you get back is a table, which of course you can query.
Isn't there an impedance mismatch that has been hand-waved away somewhere
in there.
Post by Ihe Onwuka
Why not, or if so what type of thing will I get back and what will happen
when if I try to query it (I will be allowed to query it won't I).
If I ask to order by a field and there is no schema to tell me it's type
what is SQL going to do.
JSON types are based on the syntax of the instance, not on any schema. If
it looks like a number then it is a number. 4=4.0 is true, “4”=“4.0” is
false.
So I'm SOL if it's a date then.
Talking of ordering how do I get the 5th array item or the 5th object when
SQL only deals in unordered sets.
SQL doesn’t deal only in unordered sets. It deals in tables. It has an
“order by” operator. You’re probably thinking of the relational model,
which is not the same as SQL.
Forgive me if I'm wrong. But I still don't believe I can write a statement
that gets me the 5th row of the table.
Post by Ihe Onwuka
What would be the result of union of two objects and how can I determine
whether such a union should be allowed at all when there is no schema?
Again, it’s the union of two tables which are the representations of
objects/maps, and the details depend on how you do the mapping.
I'm not doing the mapping. You're the one who saying it can be done.

How do you map a potentially infinitely recursive data structure into a
flat 2 dimensional table.

How would you predict a priori what your table and column names would be
and how many of them. Say you had two semantically different (because they
are at different levels of the hierarchy) but identically named tags what
schema is that going to generate for your tables.

If you don't can't predict the schema your semi-structured is going to
generate then how could you write your query. Generate the tables first
and then write it when you've seen what they've produced???
daniela florescu
2015-06-01 20:07:57 UTC
Permalink
So pretty much no subqueries (ok somebody is going to say that any query entailing a subquery can be rewritten without one) but AFAIC the language you are talking about isn't really SQL then.
That someone can come to me and I’ll show him/her gazzilions of nested queries that cannot be unnested (SQL and/or XQuery).

In fact, the ones you CAN unnest are pretty rare…..

(I think I listed ALL of unnestable cases of XQuery here if I remember correctly…there are not that many….
http://dl.acm.org/citation.cfm?id=1315537
)

Not even the simplest form:

for in for in …

cannot always be unnested in most cases if I remember correctly ….

Best
Dana
_______________________________________________
***@x-query.com
http://x
Michael Kay
2015-06-01 21:28:44 UTC
Permalink
Post by Michael Kay
Well, if you’re going to apply SQL to JSON (say), then the first thing you have to do is define a mapping from JSON to tables. That’s not difficult to do.
If by the “closure property” you want the result of any SQL query to be the representation of some JSON structure, then you’re not going to achieve that. That’s essentially the same as the update problem.
So pretty much no subqueries (ok somebody is going to say that any query entailing a subquery can be rewritten without one) but AFAIC the language you are talking about isn't really SQL then.
I don’t follow. You can write any query you like; it’s just that the result is a table that might not be mappable back to JSON.
Post by Michael Kay
So I'm SOL if it's a date then.
You won’t get any dates in the table representation of JSON, but you can get them in tables returned by a query.
Post by Michael Kay
Forgive me if I'm wrong. But I still don't believe I can write a statement that gets me the 5th row of the table.
Well, a typical representation of a JSON array [“a”, “b”, “c”] might be the table

ARRAYS
ID INDEX TYPE VALUE
001 0 String “a”
001 1 String “b”
001 2 String “c”

and the query to get item 2 of array 001 would be

SELECT TYPE, VALUE FROM ARRAYS WHERE ID=001 AND INDEX=2

As far as I understand it this is similar to the mapping that SQL Server uses for XML.
Post by Michael Kay
How would you predict a priori what your table and column names would be and how many of them.
If you’ve got no schema then you have to use a generic mapping in which the table and column names are generic concepts such as ARRAY, MAP, etc.
Post by Michael Kay
If you don't can't predict the schema your semi-structured is going to generate then how could you write your query.
With difficulty. I can’t see anyone wanting to write it by hand.

Michael Kay
Saxonica
daniela florescu
2015-06-01 21:44:28 UTC
Permalink
Post by Michael Kay
If you don't can't predict the schema your semi-structured is going to generate then how could you write your query.
With difficulty. I can’t see anyone wanting to write it by hand.
Look at a simple path expression like

$x//a//b

(bread and butter for any semi-structured query language)

This would translate into the equivalent SQL with:
1. nested queries
2. nested RECURSIVE queries in fact
3. two outer joins.

Good luck with trying to:
1. write that by hand and
2. find a SQL implementation that optimizes and executes that properly….


And that’s “just” a simple $x//a//b…….

Now imagine your typical XQuery (soon to be JSOniq:-) program.

Best regards
Dana




_______________________________________________
***@x-query.com
http://x-query.com/mailman/lis
Ihe Onwuka
2015-06-02 03:21:41 UTC
Permalink
So SQL over JSON = Turing Tarpit.

Fair summarisation?
Post by Ihe Onwuka
Post by Michael Kay
Well, if you’re going to apply SQL to JSON (say), then the first thing
you have to do is define a mapping from JSON to tables. That’s not
difficult to do.
If by the “closure property” you want the result of any SQL query to be
the representation of some JSON structure, then you’re not going to achieve
that. That’s essentially the same as the update problem.
So pretty much no subqueries (ok somebody is going to say that any query
entailing a subquery can be rewritten without one) but AFAIC the language
you are talking about isn't really SQL then.
I don’t follow. You can write any query you like; it’s just that the
result is a table that might not be mappable back to JSON.
Post by Michael Kay
So I'm SOL if it's a date then.
You won’t get any dates in the table representation of JSON, but you can
get them in tables returned by a query.
Post by Michael Kay
Forgive me if I'm wrong. But I still don't believe I can write a
statement that gets me the 5th row of the table.
Well, a typical representation of a JSON array [“a”, “b”, “c”] might be
the table
ARRAYS
ID INDEX TYPE VALUE
001 0 String “a”
001 1 String “b”
001 2 String “c”
and the query to get item 2 of array 001 would be
SELECT TYPE, VALUE FROM ARRAYS WHERE ID=001 AND INDEX=2
As far as I understand it this is similar to the mapping that SQL Server uses for XML.
How would you predict a priori what your table and column names would be
and how many of them.
If you’ve got no schema then you have to use a generic mapping in which
the table and column names are generic concepts such as ARRAY, MAP, etc.
If you don't can't predict the schema your semi-structured is going to
generate then how could you write your query.
With difficulty. I can’t see anyone wanting to write it by hand.
Michael Kay
Saxonica
daniela florescu
2015-06-02 03:25:25 UTC
Permalink
Post by Ihe Onwuka
So SQL over JSON = Turing Tarpit.
Yep. Fair summary.

Dana


P.S. BTW, in the complexity of the the simple query below, I forgot that every nested query has also a
sortby.

And the nested queries are not un-nestable :-)))

I even wonder if such queries can be written in SQL-99
:-)
Post by Ihe Onwuka
Fair summarisation?
Post by Michael Kay
Well, if you’re going to apply SQL to JSON (say), then the first thing you have to do is define a mapping from JSON to tables. That’s not difficult to do.
If by the “closure property” you want the result of any SQL query to be the representation of some JSON structure, then you’re not going to achieve that. That’s essentially the same as the update problem.
So pretty much no subqueries (ok somebody is going to say that any query entailing a subquery can be rewritten without one) but AFAIC the language you are talking about isn't really SQL then.
I don’t follow. You can write any query you like; it’s just that the result is a table that might not be mappable back to JSON.
Post by Michael Kay
So I'm SOL if it's a date then.
You won’t get any dates in the table representation of JSON, but you can get them in tables returned by a query.
Post by Michael Kay
Forgive me if I'm wrong. But I still don't believe I can write a statement that gets me the 5th row of the table.
Well, a typical representation of a JSON array [“a”, “b”, “c”] might be the table
ARRAYS
ID INDEX TYPE VALUE
001 0 String “a”
001 1 String “b”
001 2 String “c”
and the query to get item 2 of array 001 would be
SELECT TYPE, VALUE FROM ARRAYS WHERE ID=001 AND INDEX=2
As far as I understand it this is similar to the mapping that SQL Server uses for XML.
Post by Michael Kay
How would you predict a priori what your table and column names would be and how many of them.
If you’ve got no schema then you have to use a generic mapping in which the table and column names are generic concepts such as ARRAY, MAP, etc.
Post by Michael Kay
If you don't can't predict the schema your semi-structured is going to generate then how could you write your query.
With difficulty. I can’t see anyone wanting to write it by hand.
Michael Kay
Saxonica
Michael Kay
2015-06-02 07:38:44 UTC
Permalink
Post by Ihe Onwuka
So SQL over JSON = Turing Tarpit.
Fair summarisation?
Yes, I’d concur with that. Same applies to SQL over XML. It’s something that’s theoretically possible but not fit for human consumption; only of interest to the extent that it might provide a lower-level interface for a language like XQuery to target. And even then, Microsoft’s attempts to do XQuery this way should probably encourage other people to stay well clear.

Michael Kay
Saxonica


_______________________________________________
***@x-query.com
http://x-query.com/mailm

daniela florescu
2015-06-01 20:03:00 UTC
Permalink
SQL doesn’t deal only in unordered sets. It deals in tables. It has an “order by” operator. You’re probably thinking of the relational model, which is not the same as SQL.
Michael,

yes, SQL deals only with unordered stuff (bags actually).

Yes, you can order-by something, but that something has to be an explicit set of columns in the relations.

Hence, in order to encode order (of an array for example) in a SQL table, you need to store somewhere the position of items in the array, in a way you can sort by it.
(which will give you nightmares at update time)

“ve been there, done that….not working properly.

That’s NOT a path that leads ANYWHERE interesting.

We’ve been there 20 years ago with SQL…..

</SQL-for-JSON>

Dana
_______________________________________________
***@x-query.com
http://x-query.com/mailman/listinf
daniela florescu
2015-06-01 18:40:20 UTC
Permalink
Post by Michael Kay
Post by daniela florescu
Using SQL to process JSON is simply NOT POSSIBLE.
Proof?
My Ph.D. thesis in 1975 showed that you can use SQL to query hierarchic data. With the emphasis on “query”. The problems start with update. So long as you restrict yourself to query, you can map almost any data model onto any other.
Michael,

you did not read my email properly.

I did NOT say that if you map JSON to relations (one way or another), you cannot use SQL on the result of the mapping.

I said: you cannot apply SQL DIRECTLY on the JSON structures, because it’s semantics is not designed for that.

Com’on Michael. You worked with me for 16 years now. Do I look like I’m so naive and I don’t get things ?

So, please. I understand that EVERYTHING (C++ objects, XML elements, COBOL structures,
assembly processing instructions, anything) can be mapped with enough brute force into relations, and that, without
loss of any information.

So thank you for this notice, but it does not relate to my email.

(BTW, I proved the same thing as you 2O years later…and this was 20 years ago…didn’t know about your PhD sorry….
http://homes.cs.washington.edu/~suciu/PAPERS/florescu-kossman.pdf
Apparently this is something that CS has to get it out of their system every 20 years …the time is up again, I guess…...)


The problem is that Drill (like everybody else in that bunch) apply SQL DIRECTLY on JSON structures, in the absence of any intermediary mapping
into relations, which is again, I repeat: A NON-SENSE.


And BTW Michael, if SQL would be good for querying hierarchical data, why in the world did we waste 16 years building XQuery !???


Just curious
Dana
Post by Michael Kay
Though the fact that you can do it doesn’t mean that it’s a good idea, of course. In those days the reason for doing it was that SQL was the only declarative query language in town, and that’s no longer the case.
Michael Kay
Saxonica
_______________________________________________
***@x-query.com
http://x-query.com/mailman/lis
Michael Kay
2015-06-01 19:29:59 UTC
Permalink
Post by daniela florescu
Post by Michael Kay
Post by daniela florescu
Using SQL to process JSON is simply NOT POSSIBLE.
Proof?
My Ph.D. thesis in 1975 showed that you can use SQL to query hierarchic data. With the emphasis on “query”. The problems start with update. So long as you restrict yourself to query, you can map almost any data model onto any other.
Michael,
you did not read my email properly.
I did NOT say that if you map JSON to relations (one way or another), you cannot use SQL on the result of the mapping.
I said: you cannot apply SQL DIRECTLY on the JSON structures, because it’s semantics is not designed for that.
You didn’t say “DIRECTLY”. You said it was “NOT POSSIBLE”.
Post by daniela florescu
Com’on Michael. You worked with me for 16 years now. Do I look like I’m so naive and I don’t get things ?
Of course not. You’ve got one of the best brains in the business. But that never stopped me debating with anyone.
Post by daniela florescu
And BTW Michael, if SQL would be good for querying hierarchical data, why in the world did we waste 16 years building XQuery !???
It didn’t say it was good for the purpose. I said it was possible. The only reason I can think of for actually doing it is that there’s a fair bit of software out there that likes to access its data using SQL.
Post by daniela florescu
Just curious
Dana
Michael Kay
Saxonica
_______________________________________________
talk
daniela florescu
2015-06-01 20:50:59 UTC
Permalink
Post by daniela florescu
And BTW Michael, if SQL would be good for querying hierarchical data, why in the world did we waste 16 years building XQuery !???
It didn’t say it was good for the purpose. I said it was possible. The only reason I can think of for actually doing it is that there’s a fair bit of software out there that likes to access its data using SQL.
The paper I wrote with Donald Kossmann in 1997 about mapping XML into relations ALSO included the
SQL queries equivalent to the simplest XPath expressions.

Those SQL queries are HORRENDOUSLY complicated.

I don’t care how many tools use SQL.

Two things come to my mind:

1. Nobody will write such SQL queries (that result from the mapping JSON-> tables or XML -> tables) and

2. Even if you manage to write them somehow, the queries are SO complex, the relational database engines are not capable
of dealing with them — not even Oracle.
(Jay here https://www.linkedin.com/profile/view?id=955015 wrote his PhD about such SQL queries, and I think he wakes
up with nightmares even now
)


Tools will need to adapt to a new query language for JSON. Or we need new tools. Or we abandon JSON.

That’s as simple as that.

Best regards
Dana
daniela florescu
2015-06-01 19:51:42 UTC
Permalink
If I need more proof of the lack of understanding of the problem from the general user population, here is a quote form
today’s Linkedin:

"The problem is that without structure (schema) one needs to program in a procedural language.”

Author will remain unknown
.

Dana
Post by daniela florescu
Post by Michael Kay
Post by daniela florescu
Using SQL to process JSON is simply NOT POSSIBLE.
Proof?
My Ph.D. thesis in 1975 showed that you can use SQL to query hierarchic data. With the emphasis on “query”. The problems start with update. So long as you restrict yourself to query, you can map almost any data model onto any other.
Michael,
you did not read my email properly.
I did NOT say that if you map JSON to relations (one way or another), you cannot use SQL on the result of the mapping.
I said: you cannot apply SQL DIRECTLY on the JSON structures, because it’s semantics is not designed for that.
Com’on Michael. You worked with me for 16 years now. Do I look like I’m so naive and I don’t get things ?
So, please. I understand that EVERYTHING (C++ objects, XML elements, COBOL structures,
assembly processing instructions, anything) can be mapped with enough brute force into relations, and that, without
loss of any information.
So thank you for this notice, but it does not relate to my email.
(BTW, I proved the same thing as you 2O years later
and this was 20 years ago
didn’t know about your PhD sorry
.
http://homes.cs.washington.edu/~suciu/PAPERS/florescu-kossman.pdf
Apparently this is something that CS has to get it out of their system every 20 years 
the time is up again, I guess
...)
The problem is that Drill (like everybody else in that bunch) apply SQL DIRECTLY on JSON structures, in the absence of any intermediary mapping
into relations, which is again, I repeat: A NON-SENSE.
And BTW Michael, if SQL would be good for querying hierarchical data, why in the world did we waste 16 years building XQuery !???
Just curious
Dana
Post by Michael Kay
Though the fact that you can do it doesn’t mean that it’s a good idea, of course. In those days the reason for doing it was that SQL was the only declarative query language in town, and that’s no longer the case.
Michael Kay
Saxonica
Loading...