|BOM Removal - to pass W3C validation|
I am attempting to update/edit a local County Web site to pass the W3C Validator and am not experienced---especially with BOMs. I have read some info about BOMs in the HTML Forum but so far I cannot find an answer to my question on how to remove the BOM so I get no warning by the W3C Validator.
If I run the W3C HTML Validator on the pages on this site, there is a warning "Byte-Order Mark found in UTF-8 File. The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to cause problems for some text editors and older browsers. You may want to consider avoiding its use until it is better supported."
If I remove the first six Hex characters with UltraEdit 16.20 like I am lead to believe is the BOM (FE BB BF) from your Forum, then save and run the validator on the file, then I get the warning "No Character Encoding Found! Falling back to UTF-8." If I look at the beginning of a w3.org file, there are no FE BB BF Hex values but there is no warning about a missing BOM!
Could someone explain what is going on and tell me how to remove the BOM so I do not get the W3C warning?
I had the same problem a few years back. Get yourself a different text editor if yours doesn't offer the UTF-8 without Byte Mark (BOM) option. I now use notepad++ it works for me.
Ha, I just saw this too, have a look if you're about to change your editor:
Hi wcarp and welcome to WebmasterWorld ;)
|Could someone explain what is going on and tell me how to remove the BOM so I do not get the W3C warning? |
Best practise is to specify a character coding. You've already done some reading but these links may provide more background information about why, which one and the best method for your site.
I've provided a range of links so at least one "works" for you.
Specifying character encoding [webstandards.org]
Handling character encodings [w3.org]
The definitive Guide to [articles.sitepoint.com]
HTML 4 recommendation [w3.org]
Character encodings [w3.org]
Sounds like you already know how to remove the BOM. I don't use UltraEdit, but I'd check to see if it has a "save as" setting that will allow you to save without the encoding. Also a search/replace or conversion feature that will allow you to convert the files rather than having edit each one manually.
Thanks for your responses. I thought I would add a little more info as to how I was saving the HTML file using UltraEdit. I saved the file as a UTF-8-No BOM and then the W3C Validator, says "No Character Encoding Found! Falling back to UTF-8." When I look at the saved file, the FE BB BF charactors at the beginning are no longer there.
There are two separate issues, and you've apparently resolved one of them - the BOM.
Character encoding is the second, and your server should be sending that information in the http header. You can augment the server header information with a meta charset element in the <head> area of the HTML document.
Hi wcarp, yes, I understand there are two issues.
As I said, you've fixed the first, and as Tedster says, you can use a <meta> element to fix the second.
An example is <meta http-equiv="content-type" content="text/html; charset=UTF-8">
The links I posted explain how to do that and give examples so you can choose which one is best for your site.
Ok, I think I see "gotcha" - wcarp, this is a step-by-step process:
1. Best practise is to send a character encoding.
2. Your original pages use BOM to do that, but the validator warns you the BOM may cause problems
3. You obligingly remove the BOM from each page. That also removes the character encoding.
4. The validator correctly tells you there is no encoding declared.
5. <--- this is where we are now ;)
The easy fix is to insert the <meta> element in the head of your document. You just need to decide which one. Once you insert it, continue to save your pages without BOM, and the validator should be happy because you have declared a character encoding in a way that should not cause problems.
I found that when the BOM is present, the file is actually being saved as UTF-16LE or somesuch.
Use a different editor.
Thanks. How to fix the problems was made clear. The interesting thing is, is that after I added the <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> line and checked it, the page passed the validation with no warnings or errors. Before, there were 15 errors. That was nice.
But then another---rather big problem appeared! All of the navigation buttons on the left side of the page disappeared! They will not display in the Web page. And all I was trying to do was fix some errors on the page. Now I am going to have to find out why this is happening.
Congratulations - that's an achievement!
|Before, there were 15 errors. |
Ouch! Know what that's like ;) If you are using xhtml, I wonder if the docs are also quirks mode.
|All of the navigation buttons on the left side of the page disappeared! |
Double check utf-8 is appropriate for your pages, and if you haven't already, check the doctype and the xml prologue. Serving HTML and XHTML [w3.org] provides a quick over view and "how to" without getting into too much technical detail, as does Changing (X)HTML page encoding to UTF-8 [w3.org].
If that doesn't work, feel free to provide some code snippets so we can take a look.
Thanks. I read over some of the info at the latest links provided above. Adding the <meta> element for UTF-8 after removing the BOM created the problem of disappearing navigation buttons. I see that there are two somewhat similar <meta> element lines now.
Here is a code snippet:
[code]<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<link rel="shortcut icon" href="favicon.ico" />
<title>Okanogan Noxious Weed Home</title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta http-equiv="Content-Language" content="en-us" /><!-- GILS meta tag system -->[/code]
If I comment out: <meta http-equiv="Content-Language" content="en-us" />, and leave the UTF-8 meta tag in, then the navigation buttons don't appear either. Obviously, I've got a lot to learn about coding.
The Web page is at: [okanogancounty.org ].
As do we all - but I think you may have started at the tricky end ;)
|I've got a lot to learn about coding. |
If you look closely you can see that one <meta> sends information about language of the document (United States English), and the other is about the type of file (in this case text/html). I wouldn't expect language to be causing this.
The "serving html and xhtml" link above touches on the issue of serving xhtml as text/html rather than xml. That also opens the issue of using xhtml at all: If you are updating, then unless you need xhtml, I would suggest to move to html strict.
However, on the practical problem: Can you clarify what you mean by
For example, are they not displaying at all, not being styled correctly (which suggests a style issue), are there images that aren't showing (perhaps the path is wrong - and some browsers will "collapse" an element if the image is missing) ... etc. Can you also confirm that the buttons only appear/disappear when you insert/delete the content-type <meta> ?
|"navigation buttons ... disappeared" |
The reason for asking is that using the same styles, but changing the doctype from html to xhtml, or from strict to transitional affects the way browsers apply the css rules, and that can lead to elements "disappearing".
A quick test is to remove the link to your style sheet and see if the unstyled elements display. If they do, the problem is most likely in the styling rather than the way the xhtml is being served.
And do keep at it - you really have started with an interesting challenge - once solved things will get much easier - and making changes to existing code will trigger a "chain reaction" if the code wasn't ideal to start with ;)
Thanks. I must agree, I am getting into things than I was not expecting to deal with at this time but I guess that's one way to learn.
By the navigation buttons disappearing, I mean totally gone---nothing and no text that was on them.
While I was doing further testing, I discovered that some files had a BOM in them after I removed it. It seems to have something to do with SharePoint Designer 2007. Also what it shows in SharePoint Design mode is not at all how IE 8 renders the navigation buttons. Also, for some reason, W3C's Validator is very slow or isn't working at all now for me so I'm going to have to wait to do more validation and provide more info.
The link to the style sheet that alt131 suggests to remove is, I presume, <link rel="stylesheet" type="text/css" href="css/main.css" title="andreas09" media="screen,projection" />. Is that correct?
That's a stylesheet link, but I cannot confirm that is in your code as personal urls [webmasterworld.com] aren't allowed. The point is, remove all styling (including inline) and test to see if the buttons re/dis-appear.
|The link to the style sheet that alt131 suggests to remove is, |
Previews are notorious for this. Always test in an actual browser - and test across several.
|Also what it shows in SharePoint Design mode is not at all how IE 8 renders the navigation buttons. |
Are you also using the sharepoint server?
Also, can you confirm whether the buttons display after you remove the character encoding <meta> you just inserted?
The slowness of the validator shouldn't be an issue as you already know you have valid code. What you are trying to do is figure why the buttons aren't displaying.
Just take one page, edit in Ultraedit to make sure it has no BOM. Test in the browsers to see if the buttons do/don't display with/without the meta element. If they don't remove all styles to see if the unstyled xhtml will display. Then let us know the outcome.