Forum Moderators: coopster

Message Too Old, No Replies

Cleaning Up Text with PHP

         

username

5:32 am on Jun 9, 2009 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi, I am parsing some RSS info from a feed, and cleaning up the character data. I use the following method, but for some reason, some of the single quote characters do not get noticed, and are ignored. My goal is to convert all non alphanumeric characters to their html character entitties:

$title = htmlentities($title, ENT_QUOTES);

If anyone has any ideas on strengthening this up, it would be appreciated. I have already tried using str_replace, but the characters are still ignored?

Thanks.

enigma1

6:08 pm on Jun 9, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



what's the character set the rss feed uses?

Does it work if you do
$title=htmlentities($title,ENT_QUOTES,'UTF-8');

Also check the details in php.net
[php.net...]

username

9:19 pm on Jun 9, 2009 (gmt 0)

10+ Year Member Top Contributors Of The Month



The trouble is, the function loops through a few feeds, from different url sources, so the character set may vary.I guess I could run the htmlentities script 12 times, for each of the character sets, which should theoretically wipe out all potential errors.

Is there a better solution?

username

9:52 pm on Jun 9, 2009 (gmt 0)

10+ Year Member Top Contributors Of The Month



Ok, so this solution works when you choose the right character set for htmlentites to utilise, however when you check the text with all at once it causes errors. What is the best way to detect character encoding within RSS feeds?

username

5:29 am on Jun 10, 2009 (gmt 0)

10+ Year Member Top Contributors Of The Month



Just an update, I detected the character encoding using a preg_match, and just implemented the correct encoding in htmlentities. Job done.