Forum Moderators: open

Message Too Old, No Replies

Unicode Byte-Order Mark in UTF-8 encoded files

         

surrealillusions

3:32 pm on Mar 4, 2008 (gmt 0)

10+ Year Member



I am trying to validate a webpage, however, something i've never seen before has cropepd up.

The w3c validator is saying the following

"The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to cause problems for some text editors and older browsers. You may want to consider avoiding its use until it is better supported."

The top of the page is

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>

I have removed the keywords, stylesheet and description (as they're not important in this).

Anyway, I use the same charst-utf-8 on my site, and that validates fine, no warnings about the BOM. I cant made head or tail out of the wiki pages about the BOM, and i cant seem to find much info on why this error occurs. Anyone got any ideas?

:)

bill

2:42 am on Mar 5, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We had an interesting thread on this topic a while back: Saving an .html as proper UTF-8 [webmasterworld.com]

If this is a concern or problem for your users then you may want to manually edit the BOM out of your files.

surrealillusions

4:01 pm on Mar 5, 2008 (gmt 0)

10+ Year Member



thanks but i still dont get it..probably just me been an idiot or something..

how do you edit out the BOM? I couldn't see anything in the file that looked out of the ordinary when editing the file in Dreamweaver..do i need to alter a setting to view it or something else?

:)

bill

3:00 am on Mar 6, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A HEX editor is good for editing out the BOM. The text editor I used in the referenced thread had the option to save the file without the BOM, or to edit in HEX directly. I doubt DW has that capability. You would need another editor.

penders

1:58 pm on Mar 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



thanks but i still dont get it..

As I understand it, the BOM (as it appears in UTF-8 files) is contained in the 1st 3 bytes of the file and merely identifies the file as being UTF-8. It does nothing to the structure of the file, or says anything more about the structure. If the file has been saved as UTF-8 it is UTF-8 with or without the BOM. Windows might interpret the BOM OK, however, Linux / web browsers do not.

If it's not interpreted OK then you get some funny characters at the start of the file when it's displayed. On the web you inform the browser that the file is UTF-8 by the Content-Type header, not by the BOM.

how do you edit out the BOM? I couldn't see anything in the file that looked out of the ordinary when editing the file in Dreamweaver..do i need to alter a setting to view it or something else?

If you don't see the BOM (if it is in fact there) when you edit the file in DW, then DW is probably interpreting the file OK (it sees there is a BOM and doesn't show it!). I would have thought you should be able to Save-As the file and pick an encoding type that does not include the BOM/Signature (as bill suggests)? (Simply saving the file will probably just save it again in the same format, ie. with a BOM.)

Notepad2 (great little Windows Notepad replacement) has option of picking the encoding: "UTF-8" or "UTF-8 with Signature" (ie. with BOM).

surrealillusions

6:44 pm on Mar 7, 2008 (gmt 0)

10+ Year Member



thanks penders - makes a bit more sense to me. I will have a look at it in more detail and see what i can find out

:)

g1smd

7:59 pm on Mar 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I hit this problem a few years ago when I discovered the "Save As Unicode" option in Windows Wordpad actually saved the file as UTF-16-LE format.

bill

2:11 am on Mar 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Take a look at the Expression Web 2 (beta) [webmasterworld.com]. One of its new features is an option to set the BOM or remove it from your pages.