How to keep the escape codes in Xaml
6 posts
• Page 1 of 1
How to keep the escape codes in Xaml
While using ⎕XML I realised that the following Xaml was not the same after using ⎕xml twice on it:
The escape code ' ' is not making the round trip. Is there an option in ⎕xml that will specify to keep the ' ' ?
Thanks in advance.
<RichText:RichTextBoxAdv PageLayout="Pages">
<RichText:DocumentAdv>
<RichText:SectionAdv>
<RichText:ParagraphAdv>
<RichText:SpanAdv Text="This is first line This is second line"></RichText:SpanAdv>
</RichText:ParagraphAdv>
</RichText:SectionAdv>
</RichText:DocumentAdv>
</RichText:RichTextBoxAdv>
The escape code ' ' is not making the round trip. Is there an option in ⎕xml that will specify to keep the ' ' ?
Thanks in advance.
-
PGilbert - Posts: 440
- Joined: Sun Dec 13, 2009 8:46 pm
- Location: Montréal, Québec, Canada
Re: How to keep the escape codes in Xaml
Did you try
⎕XML ⍠ 'Whitespace' 'Preserve' ?
Note that some control characters below 32 (CR, NL should be OK) cannot be rendered in XML, this is what the standard says.
⎕XML ⍠ 'Whitespace' 'Preserve' ?
Note that some control characters below 32 (CR, NL should be OK) cannot be rendered in XML, this is what the standard says.
- DanB|Dyalog
Re: How to keep the escape codes in Xaml
Hello Daniel, it did not work. When you execute ⎕XML the first time you get the Apl matrix and the ' ' is correctly changed to a carriage return. The problem is when you do the ⎕XML on the Apl matrix the carriage return should be changed back to ' ' (as per the Dyalog documentation says) but it is not, it stays has a carriage return in the xaml in characters.
So my question was is there is a way to prevent ⎕XML to change the escape codes to their equivalent Apl representation ?
So my question was is there is a way to prevent ⎕XML to change the escape codes to their equivalent Apl representation ?
-
PGilbert - Posts: 440
- Joined: Sun Dec 13, 2009 8:46 pm
- Location: Montréal, Québec, Canada
Re: How to keep the escape codes in Xaml
At present there is no option to control what happens when character references (e.g. ' ') or recognised entity references (e.g. '&') are encountered in XML text. There is an option to preserve unrecognised entity references and when it is used they round-trip. We could possibly extend that so that all character and entity references are preserved - it would give you the ability to round-trip as you want but would likely make processing the APL array more difficult because it would still contain XML (the character/entity references).
The full explanation for that answer is quite long - I apologise in advance!
When you round-trip XML using ⎕XML you get don't necessarily get a something which is character-for-character identical because the APL array format does not carry the original formatting. Indeed, you can use this property as an XML pretty-printer. It should, however, be as far as possible semantically equivalent. I think the current behaviour of ⎕XML matches the Dyalog documentation and meets the requirements of the XML standard (http://www.w3.org/TR/REC-xml/) so let me explain that reasoning and discuss your options.
The documentation for ⎕XML states:
It is for this reason that your character reference is converted to the actual character 10 (LF) when parsing the XML.
It goes on to say:
It is for this reason that your character is not regenerated back into a character reference. The character can be legitimately included in the generated XML and by this point there is no longer any indication that the character happened to originate from a character reference so it is emitted like any other character. In general I think this is the behaviour most would consider reasonable - for example, if you were to process
the attribute value would be presented in the APL array as 'AA' and this would round trip to
not
The alternative would be to preserve the character reference in the text in APL array in some way (which does happen if it cannot be substituted and the UnknownEntity variant option is set to 'Preserve') but there are two reasons why this is not currently done:
We could perhaps introduce an option to change this behaviour so that character references are preserved even when they could be removed. If that was done then your XML would round-trip as you expect as it does in this current example:
We could also perhaps introduce an opton to force line-ending characters to be regenerated in XML as character references. It could be argued that if they are present in the APL array presented to ⎕XML then this would be the correct behaviour, because the only way they could be there is if they originated from character references in the first place. Section 3.3.3 of the XML standard also says:
that is, an actual line ending character would have been replaced by a space. There are two reasons why we currently do not:
The second point above needs some explanation! Yet another rule of the W3C standard (2.11 End-of-Line Handling) states
This rule means that all the different line ending character representations become indistinguishable on output. Furthermore, this rule means that all such sequences map to the xA (10, Line Feed) character which can be problematic because it is not universally recognised as the line ending character. Rightly or wrongly, but as a convenience, ⎕XML maps all of the Line Feed characters in the output to the character specified as the APL line ending character - that is ⎕AVU[3+⎕IO].
So to get the behaviour you want there are perhaps two options:
Note also that the '&' character is always mapped to '&' in generated XML so that arbitrary text may contain this character. To generate a character reference such as ' ' it is necessary to present the '&' as the escape character and use the 'UnknownEntity' variant option, e.g.
I hope this explanation is helpful. If you do want an enhancement to the behaviour of ⎕XML then please do drop an email to support requesting it.
The full explanation for that answer is quite long - I apologise in advance!
When you round-trip XML using ⎕XML you get don't necessarily get a something which is character-for-character identical because the APL array format does not carry the original formatting. Indeed, you can use this property as an XML pretty-printer. It should, however, be as far as possible semantically equivalent. I think the current behaviour of ⎕XML matches the Dyalog documentation and meets the requirements of the XML standard (http://www.w3.org/TR/REC-xml/) so let me explain that reasoning and discuss your options.
The documentation for ⎕XML states:
⎕XMLconverts entity references and all character references which the APL character set is able to represent into their character equivalent when generating APL array data
It is for this reason that your character reference is converted to the actual character 10 (LF) when parsing the XML.
It goes on to say:
When generating XML it converts any or all characters to entity references as needed.
It is for this reason that your character is not regenerated back into a character reference. The character can be legitimately included in the generated XML and by this point there is no longer any indication that the character happened to originate from a character reference so it is emitted like any other character. In general I think this is the behaviour most would consider reasonable - for example, if you were to process
- Code: Select all
<a att="AA"/>
the attribute value would be presented in the APL array as 'AA' and this would round trip to
- Code: Select all
<a att="AA"></a>
not
- Code: Select all
<a att="AA"></a>
The alternative would be to preserve the character reference in the text in APL array in some way (which does happen if it cannot be substituted and the UnknownEntity variant option is set to 'Preserve') but there are two reasons why this is not currently done:
- Removing the XML character reference in this way is a convenience to the APL programmer - she has used ⎕XML to convert away from XML and would likely prefer not to have any remaining XML in the converted data.
It is a requirement of the XML standard - see section 3.3.3 Attribute-Value Normalization:
For a character reference, append the referenced character to the normalized value.
We could perhaps introduce an option to change this behaviour so that character references are preserved even when they could be removed. If that was done then your XML would round-trip as you expect as it does in this current example:
xml←⎕XML ⍠'UnknownEntity' 'Preserve'
xml xml'<a x="&dyalog;"/>'
<a x="&dyalog;"/>
We could also perhaps introduce an opton to force line-ending characters to be regenerated in XML as character references. It could be argued that if they are present in the APL array presented to ⎕XML then this would be the correct behaviour, because the only way they could be there is if they originated from character references in the first place. Section 3.3.3 of the XML standard also says:
For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value
that is, an actual line ending character would have been replaced by a space. There are two reasons why we currently do not:
- ⎕XML is not just used to round-trip in this way. When generating XML, line endings are preserved as given so that the XML may be formatted as desired and because to change them to character references would change the semantics of the code.
As things currently stand, the would round trip to
The second point above needs some explanation! Yet another rule of the W3C standard (2.11 End-of-Line Handling) states
XML processor must behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character.
This rule means that all the different line ending character representations become indistinguishable on output. Furthermore, this rule means that all such sequences map to the xA (10, Line Feed) character which can be problematic because it is not universally recognised as the line ending character. Rightly or wrongly, but as a convenience, ⎕XML maps all of the Line Feed characters in the output to the character specified as the APL line ending character - that is ⎕AVU[3+⎕IO].
So to get the behaviour you want there are perhaps two options:
- Request an enhancement to the interpreter so that character references are left in situ. This would solve your round-trip requirement but it may also require that you introduce additional code to handle the text in the APL array format, i.e. before it completes the round-trip.
Convert the line-ending character back to an entity reference before completing the round-trip yourself. This will be complicated by the fact that you will not necessarily know which line-ending character was specified in the original XML.
Note also that the '&' character is always mapped to '&' in generated XML so that arbitrary text may contain this character. To generate a character reference such as ' ' it is necessary to present the '&' as the escape character and use the 'UnknownEntity' variant option, e.g.
xml←⎕XML ⍠'UnknownEntity' 'Preserve'
xml 0 'a'((⎕UCS 27),'#10;')(0 0⍴'')5
<a> </a>
I hope this explanation is helpful. If you do want an enhancement to the behaviour of ⎕XML then please do drop an email to support requesting it.
-
Richard|Dyalog - Posts: 44
- Joined: Thu Oct 02, 2008 11:11 am
Re: How to keep the escape codes in Xaml
Many Thanks Richard, this is answering my question and was very educational.
-
PGilbert - Posts: 440
- Joined: Sun Dec 13, 2009 8:46 pm
- Location: Montréal, Québec, Canada
Re: How to keep the escape codes in Xaml
Wow
This should become an article in Vector.
This should become an article in Vector.
-
kai - Posts: 141
- Joined: Thu Jun 18, 2009 5:10 pm
- Location: Hillesheim / Germany
6 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 0 guests
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group