Forum Moderators: coopster

Message Too Old, No Replies

non sgml characters coming out of my db

preventing some pages from validating

         

HelenDev

2:36 pm on May 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Does anyone know what non SGML character number 148 is?

I think it is a little boxy thing, how do I weed this out and stop it causing validation errors? I think it got there from me endlessly copying and pasting into my db.

In case it is also important, I edit the db in access and upload to mysql.

HelenDev

4:01 pm on May 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Another thought... this is the charset tag that I am using in the head of my pages - will this have any effect and should I change it?


<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

jatar_k

6:26 pm on May 24, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



could it be ö?

I use an accented character replace function before I input to my db.

from user comments here
[ca.php.net...]

function removeaccents($string){
return strtr(
strtr($string,
'ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ',
'SZszYAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy'),
array('Þ' => 'TH', 'þ' => 'th', 'Ð' => 'DH', 'ð' => 'dh', 'ß' => 'ss',
'Œ' => 'OE', 'œ' => 'oe', 'Æ' => 'AE', 'æ' => 'ae', 'µ' => 'u'));
}

that might help

HelenDev

8:05 am on May 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the reply, jatar_k.

could it be ö?

I don't think it can be that because I'm pretty sure I don't have any of these in my db. What I do have is lots of " and a few boxy things which are invisible on the actual page itself and in the db. Here is exactly what the validator said...


Line 318, column 462: non SGML character number 148

...urn to base warranty">Space-saving 21¼/strong>&#148; LCD Television with its own built-in

At first I thought it was the " marks, but I have them all over the place and they don't cause errors with every page. When I try to print the boxy thing here, you'll notice it has been replaced by a #148 - telling huh?

If I look at the source code of the page itself I can see that the boxy thing has replaced the " in some cases only, but not others.

If I could see the boxy thing on the page or in the db, I would just remove it by hand, but the " in the db all look the same to me, but obviously some of them are turning themselves into boxy things.

Confused.
H.

Timotheos

3:29 pm on May 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm thinking it's more to do with unicode. Check out this chart:
[alanwood.net...]
If this is a newer version of Access then this is more then likely.

If that's the case then you might want to try
<META http-equiv="Content-type" content="text/html; charset=UTF-8">

HelenDev

3:44 pm on May 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the reply Timotheos. I think we might be getting somewhere here. When I replace my charset meta tag with

<META http-equiv="Content-type" content="text/html; charset=UTF-8">

as you suggest, the validator is now behaving slightly differently...

Sorry, I am unable to validate this document because on lines 318, 321, 324 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.

I am afraid I am not very clued up with this charset thing - are there any other versions could I try, or is that it? I'm hoping the content of the link you posted will make more sense to me tomorrow, when I haven't been sat in front of a computer for nine hours ;)

Also, if anyone has any other thoughts or ideas about this, please post them, even if it is just something to try. Other people must have come across this type of thing before, no?

Cheers,
H.

Timotheos

4:01 pm on May 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmmmm, how about?

<META http-equiv="Content-type" content="text/html; charset=unicode">

You using Access XP?

HelenDev

4:13 pm on May 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You using Access XP?

Yes.


<META http-equiv="Content-type" content="text/html; charset=unicode">

ooh err missus, this has made something happen anyway. Now I have...

Sorry! A fatal error occurred when attempting to transcode the character encoding of the document. Either we do not support this character encoding yet, or you have specified a non-existent character encoding (often a misspelling).

The detected character encoding was "unicode".

The error was "".

If you believe the character encoding to be valid you can submit a request for that character encoding (see the feedback page for details) and we will look into supporting it in the future.


...and the little boxy things are now actually appearing on the page itself (not good!).

Please keep those suggestions coming!

HelenDev

8:50 am on May 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think I have solved my problem! Kind of anyway.

If I put in a normal double quotation mark - " - &quot; &#34; into the db, this is fine.

The ones causing problems are the right double quote - ” - &rdquo;

So I have just done a find and replace for all of them in my db, uploaded it and everything is rosy :)

The only thing to worry about is that in a font like arial, the double quote and right double quote both look the same :/

But I have now set notepad to times new roman (that'll be fun when coding) so I can spot the evil quotes right away before I put them in the db :)

ergophobe

5:52 pm on May 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You are using curly quotes. Unfortunately, you are also using Windows-1252 encoding to type these into whatever app they come from (Word, WordPerfect) and trying to serve it up as Unicode. This won't work. Some of the Windows-1252 codes are not allowed in Unicode (or the ISO-8859-* families). The curly quotes fall in this range and that is your problem.

If you declare your charset as Windows-1252 or you convert your underlying data in the DB to Unicode and declare your charset as UTF-8 you should be fine.

Tom