Forum Moderators: Robert Charlton & goodroi
Frequently the advise given them is to the effect of "go back and clean up your code", in addition to other things. I haven't been nailed yet, but rather tahn wait for the axe to fall, I went through my small, 100-page site, using the w3c validator. Almost every page showed some error. Not horrible errors, but errors nonetheless. I cleaned everything up and now the pages all validate.
My questions are these:
1) some of my links to external sites contained characters that the validator didn't like. Would G have penalized me as non-wellformedness because it too didn't like the characters?
2) if a site is TOO clean, might G penalize it for being over-optimized?
Thanks in advance.
or they have capital tags <A HREF, <A href, etc.
Case would not be an issue in the HTML Markup. Google will index whatever you put there. As long as it is a valid attribute, etc., it's going to be indexed no matter what the case is.
don't think so. Just checked the google main page on the w3c-validator: 44 erros and this: "No DOCTYPE found! Attempting validation with HTML 4.01 Transitional."
This is a serious matter: I can see the point for MC pointing to the issue of uncrawlable sites. I assume it takes far longer to analyse a page with syntax-errors: first you attempt a regular crawl with all those sophisticated filters of on-page evaluation of tags, until the parser realizes there is an - lets say - unfinished h2-tag. Then it falls back to mere text. Maybe its vice versa, but however this is arranged in detail, it will take far longer to crawl an illformed page. The difference could easily amount to a factor of ten or twenty depending on how tricky the mistake is. The same holds true for browser-page-loading time.
I believe things were much easier for MC or the google-parsers if the google pages themselves followed the standards, so that in communication with webmasters google could point to its own pages as a paradigm. I'm not sure: I think the rfc-sandards do not require a doctype declaration, so google's main page may be correct. But the w3c-validator is the most widely known and used one, and it doesn't cope with google pages.
Matt, I think you goofed mate.
You could have quite easily said something like ""We know that 40% of the net has HTML coding errors, but Google does its best to crawl any page that it finds. As long as a text browser renders your site, Google should be able to understand it - but to be on the safe side, why not run a few pages through a HTML validator and fix up as much as possible to give your site the best chance of being properly indexed and understood."" That could still be said without saying that validation is a requirement or a high priority.
I think many people now interpret what you said as "don't bother to validate at all, it isn't worth it". From the many sites that I have looked at, I still think that getting rid of nesting errors, and poor code, is a high enough priority to be spending some time getting it right.
A page with broken code in the links probably isn't properly crawlable; and five minutes spent with an HTML validator would find those type of problems very easily.
Example:
Title tags not in the head
Unclosed meta tags
Unclosed href tags
Keep in mind, code does matter to other search engines like yahoo and msn......
If you do not want to spend minutes doing it and fixing simple errors, that is on you. But realize, that might be the difference in your ranking number one or your competitor ranking number one.
Why take the chance?
The other thing you need to think about, is how your coding displays in other internet browsers. Being w3c cmpliant it will look the same in all internet browsers.
What if your site errors out and does not display in firefox browsers correctly? Chances are then all your firefox browsing people would not see your site correctly and might just click off your site and go to one that does display correctly. So, do not forget your customers or visitors.
g1smd, as usual, you are right on target.
After reading everything here I have gleaned the following observations and would sure appreciate any feedback.
1) According to MC, 40% of web pages are improperly coded to some degree. This males sense because:
A) old pages that may have been more of less compliant in 1997 but fall afoul of current standards.
B) inept coding by rank amateurs.
C) flawed code generagted by broken programs.
d) pages that have been worked on by a succession of different coders.
e) good old operator error
2) I believe that Google CANNOT exclude broken pages or else their index would only cover 60% of the web at best. If the page can be crawled (meaning that the errors do not reach a certain level of severity), it is eligable to be indexed.
3) I am still not convinced that there is enough evidence to say that
G will rank you higher or lower based on cleanliness of code, but having code that validates certainly cannot hurt.
4) My original question still stands; does G expect a certain (albeit miniscule) level of error, as authentication of it being a "natural" page as opposed to being a page made specifically for its consumption?
Please consider the following scenario:
A man is down on his luck and hungry. He is wandering the streets of a large city, wondering where and when his next meal will be. He is walking through a parking lot and spots a bag from a fast-food restaurant lying on the ground. Inside he finds a bag of fries and two cheesburgers, still in their wrappers. It seems reasonable that it could have fallen out of a car entering or leaving the parking lot, so he sniffs the food and smelling nothing offensive...consumes said "next meal".
Now let's change things a bit. The man traversing the parking lot encounters not a bag from a fast-food joint, but a spotless, clean white tablecloth, neatly spread across the ground in an empty parking space. Gleaming silverware is carefully arranged on both sides of a china plate. On that plate is a medium-rare, charbroiled Porterhouse steak with servings of a baked potato and sauteed onions. A small plate to the side contains a crisp chilled salad. Might not the man approch a meal presented to perfectly with suspicion, since he can come up with no plausible explanation for its appearance?
If a page not only validates perfectly but has metatags of exactly the length that G prefers and exactly the content it expects in the H1, H2 and H3 tags, might G not view it as "suspect"?
Thanks in advance.
(oh, and sorry about the lame analogy. It's the best I can do on just one cup of coffee.)
I love the analogy. Keep in mind, that clean code will display the same on all browsers. So no matter what, all your surfers will see exactly the same....
Remember, its not all about google, msn and yahoo love clean code as well. I believe those search engines do not have as much "error room" as google.