Forum Moderators: open

Message Too Old, No Replies

Illegal Charcters in XML feed

         

iamvela

9:16 am on Jul 15, 2008 (gmt 0)

10+ Year Member



I have an XML feed that is based upon text submitted by users, however every so often users submit characters taht are illegal for XML causing the entire feed to choke :(

I need some help in filtering out (brute force replace is ok) these ilegal characters.

TIA,

eelixduppy

9:22 am on Jul 15, 2008 (gmt 0)



Whatever script you are using to parse the feed is what you need to replace the characters. For example, with PHP you can use str_replace(). I, on the other hand, would run the XML through W3C's validator service to see if it is valid XML before using it--if not then show an alert of some kind.

httpwebwitch

2:36 pm on Jul 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



you can either pasteurize the text to remove/replace those characters, or you can wrap them in a special CDATA placenta.

So, for instance:

<book>
<title>The Big Book of &lt;XML&gt; &amp; &amp;agrave;cc&amp;eacute;nted char&amp;agrave;ct&amp;egrave;rs</title>
</book>

OR:

<book>
<title><![CDATA[The Big Book of <XML> & àccénted charàctèrs]]></title>
</book>

the CDATA is a far better solution

httpwebwitch

2:39 pm on Jul 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



FYI regarding CDATA:
[w3schools.com...]

Unless you know that the user-entered data is safe, like it's only EVER going to be an integer or alphanumeric string, then treat it as CDATA and encapsulate it accordingly.