Welcome to WebmasterWorld Guest from 54.162.213.67

Forum Moderators: httpwebwitch

Message Too Old, No Replies

Illegal Charcters in XML feed

     
9:16 am on Jul 15, 2008 (gmt 0)

Junior Member

5+ Year Member

joined:July 11, 2008
posts: 88
votes: 0


I have an XML feed that is based upon text submitted by users, however every so often users submit characters taht are illegal for XML causing the entire feed to choke :(

I need some help in filtering out (brute force replace is ok) these ilegal characters.

TIA,

9:22 am on July 15, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member eelixduppy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 12, 2005
posts:5966
votes: 0


Whatever script you are using to parse the feed is what you need to replace the characters. For example, with PHP you can use str_replace(). I, on the other hand, would run the XML through W3C's validator service to see if it is valid XML before using it--if not then show an alert of some kind.
2:36 pm on July 15, 2008 (gmt 0)

Moderator This Forum from CA 

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 29, 2003
posts:4059
votes: 0


you can either pasteurize the text to remove/replace those characters, or you can wrap them in a special CDATA placenta.

So, for instance:

<book>
<title>The Big Book of &lt;XML&gt; &amp; &amp;agrave;cc&amp;eacute;nted char&amp;agrave;ct&amp;egrave;rs</title>
</book>

OR:

<book>
<title><![CDATA[The Big Book of <XML> & àccénted charàctèrs]]></title>
</book>

the CDATA is a far better solution

2:39 pm on July 15, 2008 (gmt 0)

Moderator This Forum from CA 

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 29, 2003
posts:4059
votes: 0


FYI regarding CDATA:
[w3schools.com...]

Unless you know that the user-entered data is safe, like it's only EVER going to be an integer or alphanumeric string, then treat it as CDATA and encapsulate it accordingly.