Forum Moderators: coopster

Message Too Old, No Replies

Problem getting an en-dash to &ndash and other char conversions

         

neophyte

4:02 am on Oct 2, 2010 (gmt 0)

10+ Year Member



Hello All -

I'm trying to write a script which - after text is pulled from a MYSQL DB - checks the text for various characters such as "&" and "..." and "-". If any of these three characters are found, they're converted into the correct html entities - in the case of the "-", that's converted to an ndash.

Here's the weird part: When I test this script as shown here...

++++++++++++++++++++

function char_to_entity($text)
{
$text = str_replace('&', '&', $text);

$text = str_replace(' - ', '–', $text);

$text = str_replace('...', '…', $text);

return $text;
}

$str = 'I have a dog... and a cat - but they only speak french & Latin';

echo char_to_entity($str);

++++++++++++++++++++

... in a regular static document, it works perfectly.

However, when I run text from the database through this script, it will sometimes convert some of the characters, but never all of them as I need.

This would seem so straight-forward but I'm pulling my hair out over it. My database text fields are UTF-8... could that be it?

Any guidance is greatly appreciated.

Neophyte

Anyango

9:22 am on Oct 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Can that have something to do with this space in this ' - ' ? shouldnt it be '-'

just a guess, because in that case it will not replace all - signs, it will only replace if they are surounded by a single space each side

neophyte

10:23 am on Oct 2, 2010 (gmt 0)

10+ Year Member



Hello Anyango -

No, the ' - ' is deliberate as the client wants all dashes with a space surrounding them to be converted to ndashes while keeping hyphens (with a word on each side) as regular hyphens.

I've even tried it by removing the bordering spaces and the outcome is still unpredictable.

penders

10:43 am on Oct 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Are you saying that some of any chars are not converted, ie. some '&' are converted, but not other '&'s? Or is it just the hyphens (-)?

If it's just the '-', is it possible that these are not all hyphens in your data? Some could already be ndash or even mdash characters - which might just look very similar?

neophyte

11:06 am on Oct 2, 2010 (gmt 0)

10+ Year Member



Hi penders -

You hit it on the head about the ' - '; I just went back into the DB and found that those were really n (or perhaps m-dashes... how they got into the database is a complete mystery to me!)

However, interestingly (irritatingly) NONE of the '&' convert to & They will convert if I run the text through htmlentities or specialchars, but my str_replace doesn't recognize them. I tested this just now by typing a '&' into the DB field and it remains '&' and not &.

Anyone have any idea about when an ampersand isn't seen as an ampersand?

penders

12:25 pm on Oct 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



...how they got into the database is a complete mystery to me!


Could the text have been pasted in from another editor (eg. MS Word) initially? You can get all sorts of funny characters that way.

However, interestingly (irritatingly) NONE of the '&' convert to &


Initial thought was that there could be a difference in the UTF-8 representation of this character. However, I think they should be the same!? What DB encoding are you using?

neophyte

3:02 am on Oct 3, 2010 (gmt 0)

10+ Year Member



Hi Penders -

Database encoding is: utf8 -- UTF-8 Unicode / utf8_general_ci.

However, I also see that the table collation is latin1_swedish_ci. Maybe that's the trouble with the ampersand?