[xquery-talk] Text Markup vs Data Serialization

Discussion:

[xquery-talk] Text Markup vs Data Serialization - Was RE: min max and mix

David Lee

2014-02-10 12:43:03 UTC

Possibly better discussed on xml mailing lists ... but ...
This thread has me thinking ... That XML, while originally a form of Text markup (you start with text and add Markup) is of dual use as Data Serialization.
*Even in the same document* ...
This can be confusing but its also powerful.

My opinion is that the compromises made to allow this "Dual Use", while not perfect and not quite equal in each use case, are really amazing.
I cannot think of any other markup or serialization format which does better at accommodating both use cases as equal citizens reasonably well.
So much so that with XML you can come from a Data background and rarely run into anything awful (sometimes unexpected like the min/max thing),
or you can come from a Text/Document background and never even imagine that your documents could be considered "data" (you're not going to run sum() on a paragraph ... )
AND you can come from both hats at once and intermix and overlay the concepts ... if your clever enough :)
(e.g. you might run count() on the *words* in a text document or add data tables to a text document or add rich text to a data document).

Not trying to start a markup war, just reflecting on the philosophy that is embedded in XML and its tools.

_______________________________________________
***@x-query.com
http://x-query.com/mailman/listinfo/talk

Ihe Onwuka

2014-02-10 12:58:27 UTC

Permalink

Post by David Lee
Not trying to start a markup war, just reflecting on the philosophy that is embedded in XML and its tools.

Look. By all means start a markup war.

Just make sure that the OP don't get blamed for it.

Michael Kay

2014-02-10 19:11:56 UTC

Permalink

Post by David Lee
My opinion is that the compromises made to allow this "Dual Use", while not perfect and not quite equal in each use case, are really amazing.
I cannot think of any other markup or serialization format which does better at accommodating both use cases as equal citizens reasonably well.

XML does a good job at this but it leaves some well-known problems. I tried to do better in FtanML (Balisage 2013). For example FtanML:

* allows element and attribute values that are typed as integers, booleans, or sequences of anything without resorting to a schema

for example married=true, height=1.86, children=["John", "Mary"]

* distinguishes whitespace that is present for readability purposes from whitespace that's part of the content

* eliminates the artificial distinction between elements and attributes, allowing the same values to be held in both

Michael Kay
Saxonica

David Lee

2014-02-10 19:14:55 UTC

Permalink

I do like FtanML ... In fact look forward to a possible presentation at some upcoming conference with a few key concepts borrowed/stolen ...

so how far have you gotten to get FtanML/XSLT/XPath/XQuery ?
:)

( thats the thing about XML ... even with flaws it has an unsurpassed adopted toolchain )

-----Original Message-----
From: Michael Kay [mailto:***@saxonica.com]
Sent: Monday, February 10, 2014 2:12 PM
To: David Lee
Cc: ***@gmail.com; ***@x-query.com
Subject: Re: [xquery-talk] Text Markup vs Data Serialization - Was RE: min max and mix

Michael Kay

2014-02-10 19:31:47 UTC

Permalink

Post by David Lee
I do like FtanML ... In fact look forward to a possible presentation at some upcoming conference with a few key concepts borrowed/stolen ...

FtanML was very deliberately an exercise in answering the question "what would we like markup to look like if there were no compatibility / transition / adoption issues influencing the design?".

It will have succeeded if it influences whatever comes next.

As with other widely-adopted standards like SQL, Posix, and C, XML will be very hard to displace; unlike those standards it also seems to be very resistant to incremental improvement. We're currently in a position where the world has discovered better ways of serializing structured data, but hasn't yet discovered a better way of serializing narrative text or of information that mixes narrative text with structured data (which is the domain that I find most interesting).

I've got a very bad track record at predicting the future, so I really shouldn't attempt it. Perhaps some standards group needing a new specification in some area like insurance will decide that it wants something better than XML and better than JSON and invent its own syntax, which will do the job sufficiently well that people in other areas start adopting it too. Who knows.

Michael Kay
Saxonica

Per Bothner

2014-02-11 00:54:32 UTC

Permalink

Post by Michael Kay
We're currently in a position where the world has discovered better ways of serializing structured data, but hasn't yet discovered a better way of serializing narrative text or of information that mixes narrative text with structured data (which is the domain that I find most interesting).

You might found interesting SRFI-108
http://srfi.schemers.org/srfi-108/srfi-108.html
This defines a Scheme language extension for "Named quasi-literal
constructors"
which I intended to be useful for both structured data and rich test.
XQuery's
<p>Hello <em>{$name}</em></p>
would be represented as:
&p{Hello &em[name]!}

The difference is that SRFI-108 defines a *framework*, in that &p is
syntactic sugar for a call to a function or macro $construct$:p; the meaning
of the latter depends on whatever is in scope according to context.

There is a related SRFI-109 http://srfi.schemers.org/srfi-109/srfi-109.html
for extended multi-line string literals. Both use '&' for escapes, as
in XML.
I.e. character and entity references use the XML syntax, while an embedded
expression uses &[...]. The following is equivalent to Java's ("Hello
"+name+"!"):
&{Hello &[name]!}

The interesting this is that a simple string:
&{Hello &[name]!}
can be easily converted to rich text. For example:
&p{Hello &[name]!}
or:
&p{Hello &em[name]!}
assuming you have an HTML "vocabulary" in scope.

There is also a related embedded-XML syntax SRFI-107
http://srfi.schemers.org/srfi-107/srfi-107.html
This is a superset of XML, but uses the same syntax as SRFI-108/-109
for references and embedded expressions.

Kawa 1.14 implements SRFI-107, SRFI-108, and SRFI-109:
http://per.bothner.com/blog/2013/Kawa-1.14-released/

SRFI-108 defines a language embedding/extension (and specifically for
Lisp-family languages), rather than a serialization/interchange format,
but just like JSON one could define a subset or variant as a
possible data format.

--
--Per Bothner
***@bothner.com http://per.bothner.com/