Discussion:
[xquery-talk] XQuery Update Facility and unwanted whitespace
Joe Wicentowski
2012-07-09 18:45:23 UTC
Permalink
Hi all,

I'm having a problem with query I wrote that makes use of the XQuery
Update Facility. The problem is that unwanted whitespace inserted
into the results of my query. Here is my source XML (a TEI-like
list), the query in question, and the output showing the unwanted
whitespace:

source.xml:
-----------------
<list>
<item>See <ref target="#MIDDLEEAST">Middle East</ref></item>
<item xml:id="MIDDLEEAST">Middle East <ref target="#d68">68</ref></item>
</list>

fix-ids.xq:
--------------
let $doc := doc('source.xml')
for $item-id at $count in $doc//item/@xml:id
let $new-id := concat('in', $count)
let $new-target := concat('#', $new-id)
let $targets := $doc//ref[@target = concat('#', $item-id)]/@target
return
(
(: fix @xml:ids :)
replace value of node $item-id with $new-id
,
(: fix @targets :)
for $target in $targets
return
replace value of node $target with $new-target
)

output:
----------
<list>
<item>See <ref target="#in1">Middle East</ref>
</item>
<item xml:id="in1">Middle East <ref target="#d68">68</ref>
</item>
</list>

Note that while the query only modifies attribute values, the results
of the query are somehow re-indented. (Specifically, in the source,
there was no whitespace between </ref> and </item>, but in the
results, </item> is on a new line.

Is this a serialization issue? Is there a way for me to declare some
options that will prevent the unwanted whitespace from being inserted?

I'm not sure whether this is a general XQuery issue or an
implementation-specific issue, so let me know if this isn't the right
forum for this question. I'm using oXygen 13 in XQuery Debugger mode
with Saxon EE-XQuery 9.3.0.5.

(On a related note, I see that XQuery 3.0 has new support for
serialization options --
http://www.w3.org/TR/xquery-30/#id-serialization -- but oXygen doesn't
seem to allow combining XQuery 3.0 with XQuery Update Facility and
Saxon EE. This forum post instructs users to disable XQuery 1.1/3.0
support in order to use XQUF:
http://www.oxygenxml.com/forum/topic6615.html.)

Thanks,
Joe
Andrew Welch
2012-07-09 19:02:22 UTC
Permalink
Post by Joe Wicentowski
Is this a serialization issue? Is there a way for me to declare some
options that will prevent the unwanted whitespace from being inserted?
I'm not sure whether this is a general XQuery issue or an
implementation-specific issue, so let me know if this isn't the right
forum for this question. I'm using oXygen 13 in XQuery Debugger mode
with Saxon EE-XQuery 9.3.0.5.
In the XQuery options (in oXygen) untick 'format transformer output'
--
Andrew Welch
http://andrewjwelch.com
Joe Wicentowski
2012-07-09 19:35:47 UTC
Permalink
Hi Andrew,
Post by Andrew Welch
In the XQuery options (in oXygen) untick 'format transformer output'
Thanks for that suggestion. That's definitely preferable, in that it
doesn't introduce new whitespace, though it does strips out all
whitespace between nodes -- which is fine with me:

<?xml version="1.0" encoding="UTF-8"?><list><item>See <ref
target="#in1">Middle East</ref></item><item xml:id="in1">Middle East
<ref target="#d68">68</ref></item></list>

To sum up, I guess this was an oXygen/Saxon-specific issue, but it
applies to all XQuery results in oXygen, not just those queries that
use the XQuery Update Facility.

Ah, whitespace. Life would be so boring and straightforward without you.

Thanks again,
Joe
Andrew Welch
2012-07-09 19:53:49 UTC
Permalink
Post by Joe Wicentowski
Hi Andrew,
Post by Andrew Welch
In the XQuery options (in oXygen) untick 'format transformer output'
Thanks for that suggestion. That's definitely preferable, in that it
doesn't introduce new whitespace, though it does strips out all
What is you "strip whitespace" set to? It sounds like it's set to "all"...
--
Andrew Welch
http://andrewjwelch.com
Joe Wicentowski
2012-07-09 21:12:04 UTC
Permalink
Hi Andrew,
Post by Andrew Welch
What is you "strip whitespace" set to? It sounds like it's set to "all"...
Great point - I had actually been fiddling with that setting earlier
before your I emailed the list. Setting it back to "none" or
"ignorable" is much better:

<?xml version="1.0" encoding="UTF-8"?><list>
<item>See <ref target="#in1">Middle East</ref></item>
<item xml:id="in1">Middle East <ref target="#d68">68</ref></item>
</list>

I couldn't get it to put <list> on a new line, but that's no biggie.

So, in sum, the combination of unticking 'format transformer output'
and setting 'strip whitespace' to 'none' or 'ignorable' yields quite
nice results.

Thanks very much, Andrew.

Joe
Michael Kay
2012-07-10 08:00:23 UTC
Permalink
Andrew has answered the whitespace questions.

It's a Saxon (not an Oxygen) restriction that XQuery 3.0 and XQuery
Update can't currently be used together in the same query. It happened
that way because both are implemented as extensions to the "core" XQuery
1.0 parser, built using subclassing. (Done that way partly because of
the code separation between different Saxon editions). We need to fix
this mechanism, which is becoming pretty unmanageable with the number of
different language dialects supported. Ideally, I suppose, we should
make a complete break and move to a bottom-up table driven parser; but
XQuery parsing is so fragile with the number of context-dependent
decisions that need to be made, it's a risky change to contemplate.

Michael Kay
Saxonica
Post by Joe Wicentowski
Hi all,
I'm having a problem with query I wrote that makes use of the XQuery
Update Facility. The problem is that unwanted whitespace inserted
into the results of my query. Here is my source XML (a TEI-like
list), the query in question, and the output showing the unwanted
-----------------
<list>
<item>See <ref target="#MIDDLEEAST">Middle East</ref></item>
<item xml:id="MIDDLEEAST">Middle East <ref target="#d68">68</ref></item>
</list>
--------------
let $doc := doc('source.xml')
let $new-id := concat('in', $count)
let $new-target := concat('#', $new-id)
return
(
replace value of node $item-id with $new-id
,
for $target in $targets
return
replace value of node $target with $new-target
)
----------
<list>
<item>See <ref target="#in1">Middle East</ref>
</item>
<item xml:id="in1">Middle East <ref target="#d68">68</ref>
</item>
</list>
Note that while the query only modifies attribute values, the results
of the query are somehow re-indented. (Specifically, in the source,
there was no whitespace between </ref> and </item>, but in the
results, </item> is on a new line.
Is this a serialization issue? Is there a way for me to declare some
options that will prevent the unwanted whitespace from being inserted?
I'm not sure whether this is a general XQuery issue or an
implementation-specific issue, so let me know if this isn't the right
forum for this question. I'm using oXygen 13 in XQuery Debugger mode
with Saxon EE-XQuery 9.3.0.5.
(On a related note, I see that XQuery 3.0 has new support for
serialization options --
http://www.w3.org/TR/xquery-30/#id-serialization -- but oXygen doesn't
seem to allow combining XQuery 3.0 with XQuery Update Facility and
Saxon EE. This forum post instructs users to disable XQuery 1.1/3.0
http://www.oxygenxml.com/forum/topic6615.html.)
Thanks,
Joe
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Joe Wicentowski
2017-03-24 19:06:48 UTC
Permalink
Hi all,

Have there been any further developments in the area of unwanted
reformatting of entire documents after applying XQuery Update operations to
just a portion of a document? I'm using oXygen 18.1 with Saxon-EE XQuery
9.6.0.7, with "Strip whitespaces" set to "None ("none")", XQuery 3.0
support enabled, and XQuery Update enabled.

For example, lines 1-2 of the source document began as this:

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:frus="
http://history.state.gov/frus/ns/1.0" xml:id="frus1969-76v36">

But after running the XQuery Update, these two lines are now merged onto a
single line:

<?xml version="1.0" encoding="UTF-8"?><TEI xmlns="
http://www.tei-c.org/ns/1.0" xmlns:frus="
http://history.state.gov/frus/ns/1.0" xml:id="frus1969-76v36">

As you can imagine, this wreaks havoc with diff tools, so I would like to
find a way, if possible, to limit the scope of whitespace changes to just
where the query applies updates.

Apologies if this turns out to be a product-specific question, but I'm not
quite sure how to distinguish in this question between XQuery Update,
Saxon, and oXygen.

Thank you,
Joe
Post by Michael Kay
Andrew has answered the whitespace questions.
It's a Saxon (not an Oxygen) restriction that XQuery 3.0 and XQuery Update
can't currently be used together in the same query. It happened that way
because both are implemented as extensions to the "core" XQuery 1.0 parser,
built using subclassing. (Done that way partly because of the code
separation between different Saxon editions). We need to fix this
mechanism, which is becoming pretty unmanageable with the number of
different language dialects supported. Ideally, I suppose, we should make a
complete break and move to a bottom-up table driven parser; but XQuery
parsing is so fragile with the number of context-dependent decisions that
need to be made, it's a risky change to contemplate.
Michael Kay
Saxonica
Post by Joe Wicentowski
Hi all,
I'm having a problem with query I wrote that makes use of the XQuery
Update Facility. The problem is that unwanted whitespace inserted
into the results of my query. Here is my source XML (a TEI-like
list), the query in question, and the output showing the unwanted
-----------------
<list>
<item>See <ref target="#MIDDLEEAST">Middle East</ref></item>
<item xml:id="MIDDLEEAST">Middle East <ref
target="#d68">68</ref></item>
</list>
--------------
let $doc := doc('source.xml')
let $new-id := concat('in', $count)
let $new-target := concat('#', $new-id)
return
(
replace value of node $item-id with $new-id
,
for $target in $targets
return
replace value of node $target with $new-target
)
----------
<list>
<item>See <ref target="#in1">Middle East</ref>
</item>
<item xml:id="in1">Middle East <ref target="#d68">68</ref>
</item>
</list>
Note that while the query only modifies attribute values, the results
of the query are somehow re-indented. (Specifically, in the source,
there was no whitespace between </ref> and </item>, but in the
results, </item> is on a new line.
Is this a serialization issue? Is there a way for me to declare some
options that will prevent the unwanted whitespace from being inserted?
I'm not sure whether this is a general XQuery issue or an
implementation-specific issue, so let me know if this isn't the right
forum for this question. I'm using oXygen 13 in XQuery Debugger mode
with Saxon EE-XQuery 9.3.0.5.
(On a related note, I see that XQuery 3.0 has new support for
serialization options --
http://www.w3.org/TR/xquery-30/#id-serialization -- but oXygen doesn't
seem to allow combining XQuery 3.0 with XQuery Update Facility and
Saxon EE. This forum post instructs users to disable XQuery 1.1/3.0
http://www.oxygenxml.com/forum/topic6615.html.)
Thanks,
Joe
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk
Michael Kay
2017-03-25 12:14:13 UTC
Permalink
Whitespace in certain places isn't reported by the XML parser to the XQuery processor, so there is no way the XQuery processor can preserve it. Examples are whitespace between the XML declaration and the first element node, and whitespace within a start or end tag.

Other things that aren't reported by the parser (and therefore can't be retained) include the choice of single-vs-double quotes around attribute values, entity references, CDATA section boundaries, redundant namespace declarations, and the order of attributes within a start tag.

Using textual diff tools on XML documents isn't really a viable strategy - you need to do the diff in a way that is XML-aware. One way is to canonicalize the two documents and compare their canonical forms. Canonicalizing takes a very similar view to XDM - though not 100% identical - as to what's significant in an XML document and what isn't.

Michael Kay
Saxonica
_______________________________________________
***@x-query.com
http://x-query.com/mailman/listinfo/talk

Loading...