Forum Moderators: coopster

Message Too Old, No Replies

We’ve ’ turning into weird symbol or? after htmlentities

         

brandon0401

12:19 am on Jan 10, 2008 (gmt 0)

10+ Year Member



I am having a weird problem with symbols that im not sure whats up with ’
ex:
$data = We’ve..

I read in some data, then perform htmlentities($data).

when there is a ’ the data reads as? after reading from db, and after investigating if I look at it before saving in db

it shows as

We�ve

I am stumped, why is this symbol doing this? I thought it was just certain areas causing

since
We’ve --gets the symbo

Crennel's -- shows up just fine

Hope I was clear enough, thanks!

brandon0401

12:20 am on Jan 10, 2008 (gmt 0)

10+ Year Member



Not sure how #s showed up, but hopefully this will show with [codes] disabled

We’ve

brandon0401

12:47 am on Jan 10, 2008 (gmt 0)

10+ Year Member



another fyi ive tried

htmlentities($title, ENT_QUOTES, "UTF-8" )

this causes them to show fine on page and leave the symbols...but obviously this causes a insert error when inserting in DB when left showing..

one option is $title = str_replace("’","'",$title);

which works

but I do see other symbols like a long - that do this, so wonder what is best way to handle?
also seem to have problems with “

thanks

thanks

PHP_Chimp

9:53 am on Jan 10, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I guess that a long term solution would be to change the setting to allow UTF-8 characters in your database. As if that is what your customers want then you should allow them to do that.

However in the short term you could just stop people entering those characters in there input.


$ok = preg_match(%[\w\.,:;'\?!-]%, $input);
if (!$ok) {
echo "Go back and do it again\n";
}
else {
// put information in database
}

You may want to add, or remove, punctuation from my list of allowed characters.
This is not really the best solution as people who want to enter valid punctuation will not be able to, however it will take about 1 minute to install, so is a dirty but quick fix.

brandon0401

4:37 pm on Jan 10, 2008 (gmt 0)

10+ Year Member



Well it does leave ' but not the other symbol above, if you look closely they are two diff symbols...

also user not entering, this is all auto from a rss feed

PHP_Chimp

4:58 pm on Jan 10, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oops thought it was in user input data.

As they appear to be to be UTF-8 characters why dont you set the character set for the page to UTF-8? Then it should display fine.


header('Content-Type:text/html;charset=utf-8');

brandon0401

10:28 pm on Jan 10, 2008 (gmt 0)

10+ Year Member



well they are stored as the? and weird symbols in the database so I dont think its the page before, dont think it would change the?

latin1_general_ci

for column..

PHP_Chimp

6:27 pm on Jan 11, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Humm.............
Have you tried changing the encoding on the database? Although I guess that you may find just changing the encoding on the original database may not help and that you would need to copy everything into a new database with UTF-8 encoding.
You could always test it with a couple of articles, to see if this will work.

Is there not a way for you to specify that you want a latin-1 encoding on the feed? As that would involve a lot less work.

brandon0401

12:33 am on Jan 12, 2008 (gmt 0)

10+ Year Member



Well my question is why does it do this in the title but not another like description type of table? they both have same encoding, what would cause this? thanks.

PHP_Chimp

7:16 pm on Jan 13, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Im guessing, but it may well just be that the content is written with a UTF-8 encoding, but the only place they use characters not in the latin-1 encoding is in the title...dont know why they would do that, but I'm guessing.

Have you had a look at the raw feed, not the data in the database? To see if they match.

As there is a possibility that there is some code somewhere that the feed is going through that is altering it. So you are getting your UTF-8 characters where there were none before.

If the feed is UTF-8 encoded then I think that you will just have to put up with them, as they are valid characters for that encoding. However if the feed is latin-1 then they shouldn't be there and you will need to look at what may be causing the problem.