Welcome to WebmasterWorld Guest from 107.20.20.39

Forum Moderators: httpwebwitch

Message Too Old, No Replies

Illegal Charcters in XML feed

   
9:16 am on Jul 15, 2008 (gmt 0)

5+ Year Member



I have an XML feed that is based upon text submitted by users, however every so often users submit characters taht are illegal for XML causing the entire feed to choke :(

I need some help in filtering out (brute force replace is ok) these ilegal characters.

TIA,

9:22 am on Jul 15, 2008 (gmt 0)

WebmasterWorld Senior Member eelixduppy is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Whatever script you are using to parse the feed is what you need to replace the characters. For example, with PHP you can use str_replace(). I, on the other hand, would run the XML through W3C's validator service to see if it is valid XML before using it--if not then show an alert of some kind.
2:36 pm on Jul 15, 2008 (gmt 0)

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member



you can either pasteurize the text to remove/replace those characters, or you can wrap them in a special CDATA placenta.

So, for instance:

<book>
<title>The Big Book of &lt;XML&gt; &amp; &amp;agrave;cc&amp;eacute;nted char&amp;agrave;ct&amp;egrave;rs</title>
</book>

OR:

<book>
<title><![CDATA[The Big Book of <XML> & àccénted charàctèrs]]></title>
</book>

the CDATA is a far better solution

2:39 pm on Jul 15, 2008 (gmt 0)

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member



FYI regarding CDATA:
[w3schools.com...]

Unless you know that the user-entered data is safe, like it's only EVER going to be an integer or alphanumeric string, then treat it as CDATA and encapsulate it accordingly.