Start Building Professional
Web Apps Today


 
Categories Question details Back To List
Question  posted by Paul K on Aug 13, 2008 17:17
open dhtmlx forum
Invalid Character in XML

How should we handle characters that are not part of the range from 0x00 to 0x7f? These characters cause a parse error when loading the serialized data into a grid.

I already set the asCData option to true using setSerializationLevel for the grid before I serialize the grid's data.

Is there some option to set to cause the serialize code to generate &# sequences (such as …) for these characters? Or do I have to do that when I detect a change to a cell's value?

Paul
Answer posted by Support on Aug 14, 2008 01:24
Grid serializes data based on page encoding. If page loaded with iso-8859-1 ( western europe ) encoding, data will may be serialized with such high order bytes - which is correct for iso-8859-1 encoding, but not correct for utf.  If page encoding is UTF, the data will be serialized as UTF also. 

To resolve issue use UTF encoding for problematic page, or just update XML parsing code on server side, so it will treat incoming XML as data encoded in iso-8859-1.
Answer posted by Paul K on Aug 14, 2008 10:34
I'll look into changing the page (HTML) encoding to UTF-8.  But unless it changes how the characters are encoded in the resulting XML document I don't think it will help.

There is no server side processing of the XML other than to save it to a file.  The error occurs when the file is loaded using the load method on the grid control on the client side.

Doing a little research I found the encoding has to be set in the XML document itself if you want it to be treated as iso-8859-1:

<?xml version="1.0" encoding="iso-8859-1" ?>
Various kinds of UTF encoding can be detected using a BOM (byte order mark) at the beginning of the file but
if one is not present, UTF-8 is the default not iso-8859-1.

Is there a way to tell the serialization code to put an explicit encoding attribute value?

Paul




Answer posted by Paul K on Aug 14, 2008 15:04
Our HTML pages already specify the charset as utf-8:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

I'm not sure if the character set is supposed to influence the encoding of characters in the XML produced by the serialize code.in the grid control.

In the dhtmlxgrid.js file there is no encoding specified in the serialize function:

    this.serialize=function(){
        var out = '<?xml version="1.0"?><rows>';

Paul
Answer posted by Support on Aug 15, 2008 04:12

>>In the dhtmlxgrid.js file there is no encoding specified in the serialize function:
The component can't detect current encoding, so it doesn't specify anything - the XML will be treated as UTF encoded

Basically javascript can't operate with encoding at all , it works with text through browser, and fully depends on browser settings, so if page uses UTF encoding, all string operations will be done in UTF as well. 

>>Is there a way to tell the serialization code to put an explicit encoding attribute value?
Only by code modification, you can change the line mentioned above. It will not affect any other aspect of serialization, just will add necessary line to the export string.