Can a site be too clean?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Can a site be too clean?

At what point is a site over-optimized?

oaktown

4:43 pm on Jul 25, 2006 (gmt 0)

I've seen a lot of people posting who say they got nailed at some point in the progress of everflux.

Frequently the advise given them is to the effect of "go back and clean up your code", in addition to other things. I haven't been nailed yet, but rather tahn wait for the axe to fall, I went through my small, 100-page site, using the w3c validator. Almost every page showed some error. Not horrible errors, but errors nonetheless. I cleaned everything up and now the pages all validate.

My questions are these:

1) some of my links to external sites contained characters that the validator didn't like. Would G have penalized me as non-wellformedness because it too didn't like the characters?

2) if a site is TOO clean, might G penalize it for being over-optimized?

Thanks in advance.

Maria444

7:01 am on Aug 1, 2006 (gmt 0)

or they have capital tags <A HREF, <A href, etc.

Case would not be an issue in the HTML Markup. Google will index whatever you put there. As long as it is a valid attribute, etc., it's going to be indexed no matter what the case is.

My pages DO GET validated with w3, even though I'm using both cases, so I also don't see how using capital tags affects crawlability, but I've read other such warning posts - how did this "rumor" spread out, or was it an issue a long time ago?

daveVk

8:03 am on Aug 1, 2006 (gmt 0)

Case may be issue for those using doctype XMTML, but in that case it wouldnt validate with upper case tag names. Dont know if google looks at doctype?

Oliver Henniges

9:02 am on Aug 1, 2006 (gmt 0)

> Dont know if google looks at doctype?

don't think so. Just checked the google main page on the w3c-validator: 44 erros and this: "No DOCTYPE found! Attempting validation with HTML 4.01 Transitional."

This is a serious matter: I can see the point for MC pointing to the issue of uncrawlable sites. I assume it takes far longer to analyse a page with syntax-errors: first you attempt a regular crawl with all those sophisticated filters of on-page evaluation of tags, until the parser realizes there is an - lets say - unfinished h2-tag. Then it falls back to mere text. Maybe its vice versa, but however this is arranged in detail, it will take far longer to crawl an illformed page. The difference could easily amount to a factor of ten or twenty depending on how tricky the mistake is. The same holds true for browser-page-loading time.

I believe things were much easier for MC or the google-parsers if the google pages themselves followed the standards, so that in communication with webmasters google could point to its own pages as a paradigm. I'm not sure: I think the rfc-sandards do not require a doctype declaration, so google's main page may be correct. But the w3c-validator is the most widely known and used one, and it doesn't cope with google pages.

Alex70

9:20 am on Aug 1, 2006 (gmt 0)

On its blog Matt, without any possible misunderstanding, says that the validation code does not affect google ranking, plus he says that he wouldn't mind about the code, instead he would focus on contents, crawlability of a website and a good marketing campaign.

g1smd

10:05 am on Aug 1, 2006 (gmt 0)

He says that if a text browser can access it, then so can Google. That still means that a page with certain HTML coding errors cannot be fully accessed.

Matt, I think you goofed mate.

You could have quite easily said something like ""We know that 40% of the net has HTML coding errors, but Google does its best to crawl any page that it finds. As long as a text browser renders your site, Google should be able to understand it - but to be on the safe side, why not run a few pages through a HTML validator and fix up as much as possible to give your site the best chance of being properly indexed and understood."" That could still be said without saying that validation is a requirement or a high priority.

I think many people now interpret what you said as "don't bother to validate at all, it isn't worth it". From the many sites that I have looked at, I still think that getting rid of nesting errors, and poor code, is a high enough priority to be spending some time getting it right.

A page with broken code in the links probably isn't properly crawlable; and five minutes spent with an HTML validator would find those type of problems very easily.

Alex70

10:12 am on Aug 1, 2006 (gmt 0)

g1smd

you are absolutely right. But I still think that a good/validated code does not affect google ranking.

trinorthlighting

12:44 pm on Aug 1, 2006 (gmt 0)

Sometimes simple errors do effect ranking.

Example:

Title tags not in the head
Unclosed meta tags
Unclosed href tags

Keep in mind, code does matter to other search engines like yahoo and msn......

If you do not want to spend minutes doing it and fixing simple errors, that is on you. But realize, that might be the difference in your ranking number one or your competitor ranking number one.

Why take the chance?

The other thing you need to think about, is how your coding displays in other internet browsers. Being w3c cmpliant it will look the same in all internet browsers.

What if your site errors out and does not display in firefox browsers correctly? Chances are then all your firefox browsing people would not see your site correctly and might just click off your site and go to one that does display correctly. So, do not forget your customers or visitors.

Alex70

1:31 pm on Aug 1, 2006 (gmt 0)

My site is XHTML 1.0, validated on all its pages, before it wasn't and had 72 html errors. Since I have changed it did not move a single position, and that was months ago.

g1smd

1:37 pm on Aug 1, 2006 (gmt 0)

Great. Validating your code didn't harm your site or your rankings, and probably improved the visitor experience for some people using certain types of browser.

oaktown

3:08 pm on Aug 1, 2006 (gmt 0)

Thnaks to all of you for the great comments! You've given me a much clearer perspective.

g1smd, as usual, you are right on target.

After reading everything here I have gleaned the following observations and would sure appreciate any feedback.

1) According to MC, 40% of web pages are improperly coded to some degree. This males sense because:

A) old pages that may have been more of less compliant in 1997 but fall afoul of current standards.

B) inept coding by rank amateurs.

C) flawed code generagted by broken programs.

d) pages that have been worked on by a succession of different coders.

e) good old operator error

2) I believe that Google CANNOT exclude broken pages or else their index would only cover 60% of the web at best. If the page can be crawled (meaning that the errors do not reach a certain level of severity), it is eligable to be indexed.

3) I am still not convinced that there is enough evidence to say that
G will rank you higher or lower based on cleanliness of code, but having code that validates certainly cannot hurt.

4) My original question still stands; does G expect a certain (albeit miniscule) level of error, as authentication of it being a "natural" page as opposed to being a page made specifically for its consumption?

Please consider the following scenario:

A man is down on his luck and hungry. He is wandering the streets of a large city, wondering where and when his next meal will be. He is walking through a parking lot and spots a bag from a fast-food restaurant lying on the ground. Inside he finds a bag of fries and two cheesburgers, still in their wrappers. It seems reasonable that it could have fallen out of a car entering or leaving the parking lot, so he sniffs the food and smelling nothing offensive...consumes said "next meal".

Now let's change things a bit. The man traversing the parking lot encounters not a bag from a fast-food joint, but a spotless, clean white tablecloth, neatly spread across the ground in an empty parking space. Gleaming silverware is carefully arranged on both sides of a china plate. On that plate is a medium-rare, charbroiled Porterhouse steak with servings of a baked potato and sauteed onions. A small plate to the side contains a crisp chilled salad. Might not the man approch a meal presented to perfectly with suspicion, since he can come up with no plausible explanation for its appearance?

If a page not only validates perfectly but has metatags of exactly the length that G prefers and exactly the content it expects in the H1, H2 and H3 tags, might G not view it as "suspect"?

Thanks in advance.

(oh, and sorry about the lame analogy. It's the best I can do on just one cup of coffee.)

g1smd

3:56 pm on Aug 1, 2006 (gmt 0)

No. I don't think that code can be "too perfect".

Content can be over-stuffed, alt attributes overdone, and heading tags slapped everywhere as spam, but the tags in a semantic document (headings - paragraphs - lists - tables - forms) could not be too perfect.

trinorthlighting

4:44 pm on Aug 1, 2006 (gmt 0)

Oaktown,

I love the analogy. Keep in mind, that clean code will display the same on all browsers. So no matter what, all your surfers will see exactly the same....

Remember, its not all about google, msn and yahoo love clean code as well. I believe those search engines do not have as much "error room" as google.

oaktown

6:43 pm on Aug 1, 2006 (gmt 0)

Thanks Trinorthlighting. All my pages validate and every page has description and keyword meta tags that are, at a minimum, slightly different. All the titles are descriptive and unique. Frankly, I seem to be getting plenty of love from G and MSN (although more would be nice), but Y hates my index page while allowing some sub-pages to do well. I pretty much ignore Y, since my referrals break down pretty much 79% G/17% MSN. If unique content, validated code and new pages added every 2-3 days isn't enough for Y, then so be it (another phrase fits nicely here).

This 43 message thread spans 2 pages: 43