homepage Welcome to WebmasterWorld Guest from 54.205.241.107
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / XML Development
Forum Library, Charter, Moderators: httpwebwitch

XML Development Forum

    
Deleting MS Word 'fancy' characters from xml feed
ms word characters are destroying my feeds
zulubanshee

5+ Year Member



 
Msg#: 3826340 posted 6:34 pm on Jan 14, 2009 (gmt 0)

People are writing their info in MS Word and copy and pasting it into the textarea field and uploading to the database. I process feeds from this data. Problem is fancy quotes, ellipses, hyphens and a few other things are breaking the feeds.

Im using PHP. Is there a regular expression that will test for all these characters? I don't need to translate them. Im satisfied with deleting them.

 

httpwebwitch

WebmasterWorld Administrator httpwebwitch us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3826340 posted 7:02 pm on Jan 14, 2009 (gmt 0)

use htmlentities($string) [ca3.php.net]

zulubanshee

5+ Year Member



 
Msg#: 3826340 posted 8:54 pm on Jan 14, 2009 (gmt 0)

thanks ill give it a try.

zulubanshee

5+ Year Member



 
Msg#: 3826340 posted 9:22 pm on Jan 14, 2009 (gmt 0)

that doesnt seem to have worked :(

httpwebwitch

WebmasterWorld Administrator httpwebwitch us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3826340 posted 9:26 pm on Jan 14, 2009 (gmt 0)

So... you're escaping the CDATA contents of your nodes with htmlentities(), and your XML is still invalid?

Can you supply a *small* sample, before and after

zulubanshee

5+ Year Member



 
Msg#: 3826340 posted 9:33 pm on Jan 14, 2009 (gmt 0)

You had it right! I had to change the character set from UTF-8 to 8859.

britlinks created a great function here

[us3.php.net...]

Thanks for your help dude!

httpwebwitch

WebmasterWorld Administrator httpwebwitch us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3826340 posted 9:49 pm on Jan 14, 2009 (gmt 0)

hey no problem zulubanshee. cheers

vincevincevince

WebmasterWorld Senior Member vincevincevince us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3826340 posted 3:40 am on Jan 26, 2009 (gmt 0)

Real problem is that browsers are POST-ing those characters even when a form is served UTF-8. If a form is served UTF-8 then the browser should only be allowing UTF-8 content within it.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / XML Development
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved