Welcome to WebmasterWorld Guest from 54.167.102.69

Forum Moderators: httpwebwitch

Message Too Old, No Replies

Deleting MS Word 'fancy' characters from xml feed

ms word characters are destroying my feeds

     
6:34 pm on Jan 14, 2009 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 7, 2008
posts:58
votes: 0


People are writing their info in MS Word and copy and pasting it into the textarea field and uploading to the database. I process feeds from this data. Problem is fancy quotes, ellipses, hyphens and a few other things are breaking the feeds.

Im using PHP. Is there a regular expression that will test for all these characters? I don't need to translate them. Im satisfied with deleting them.

7:02 pm on Jan 14, 2009 (gmt 0)

Moderator This Forum from CA 

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 29, 2003
posts:4059
votes: 0


use htmlentities($string) [ca3.php.net]
8:54 pm on Jan 14, 2009 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 7, 2008
posts: 58
votes: 0


thanks ill give it a try.
9:22 pm on Jan 14, 2009 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 7, 2008
posts:58
votes: 0


that doesnt seem to have worked :(
9:26 pm on Jan 14, 2009 (gmt 0)

Moderator This Forum from CA 

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 29, 2003
posts:4059
votes: 0


So... you're escaping the CDATA contents of your nodes with htmlentities(), and your XML is still invalid?

Can you supply a *small* sample, before and after

9:33 pm on Jan 14, 2009 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 7, 2008
posts:58
votes: 0


You had it right! I had to change the character set from UTF-8 to 8859.

britlinks created a great function here

[us3.php.net...]

Thanks for your help dude!

9:49 pm on Jan 14, 2009 (gmt 0)

Moderator This Forum from CA 

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 29, 2003
posts:4059
votes: 0


hey no problem zulubanshee. cheers
3:40 am on Jan 26, 2009 (gmt 0)

Senior Member from MY 

WebmasterWorld Senior Member vincevincevince is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 1, 2003
posts:4847
votes: 0


Real problem is that browsers are POST-ing those characters even when a form is served UTF-8. If a form is served UTF-8 then the browser should only be allowing UTF-8 content within it.