Does Google reward valid code?

Forum Moderators: open

Message Too Old, No Replies

Does Google reward valid code?

Filipe

9:40 pm on Jun 27, 2002 (gmt 0)

Does Google reward valid HTML (According to W3C standard 4.0 or the XHTML standard)? How about code-to-content density? What if we have a document that has all the content in it, and most of the formatting stuff is in a separate CSS file? Does it look kindly on pages that have a higher % of content?

meannate

11:13 pm on Jun 27, 2002 (gmt 0)

I'd use different bait if I was fishing for an answer from GoogleGuy!

That would seem like a great thing to do though... considering most of the spammy pages I see are so full of formatting errors and the like, they churn these puppies out like there's no tomorrow. Giving a higher relavancy to pages with validated code would make sense... keeping in consideration the fact that SPAM pages usually have crumby code.

caine

11:17 pm on Jun 27, 2002 (gmt 0)

i think if you had to perfectly balanced sites, and the only difference was that one validated and the other did not, hten it may tip the balance in the validateds favor, but since sites, though some may try to get exact replicants, are ususally way off duality, then i would work more towards it being good code, browser friendly, and something that seriously may come into play SE wise in the future as meannate suggests. But for now its just not a viable option for the SE's as they would probably be filtering out about 90% of the content on the web for being unvalidated, possibly even more.

JayC

11:37 pm on Jun 27, 2002 (gmt 0)

But for now its just not a viable option for the SE's as they would probably be filtering out about 90% of the content on the web for being unvalidated, possibly even more.

And including, since we're discussing Google, Google's own home page as well its SERPs.

mivox

11:45 pm on Jun 27, 2002 (gmt 0)

Google's first duty is to provide their searchers with relevant search results... if I'm looking for information about proper safety precautions for in-home welding equipment, your (theoretical) little page about safety goggles with perfectly validating code isn't going to impress me when another site has a bunch of non-valid, but very informative HAZMAT data available. Google would be seriously straying from their primary objective, IMO, if they gave you position boosts for validating code.

Then again, maybe they could give searchers the option to turn on a "Valid Only" filter... the traffic data from that would certainly give webmasters an idea how many people cared about valid code in real-life search situations.

GoogleGuy

11:52 pm on Jun 27, 2002 (gmt 0)

Basically what these folks said. :) The only data point I'd add is Eric Brewer's '96 paper that mentioned 40% of pages have actual errors in the pages.

egomaniac

12:10 am on Jun 28, 2002 (gmt 0)

> What if we have a document that has all the content in it, and most of the formatting stuff is in a separate CSS file? Does it look kindly on pages that have a higher % of content?

I think that all SE's look kindly on this strategy. Pages with the content first are giving the spiders what they want - content - instead of a bunch of table tags or navigational javascript.

There is a theory that the spiders have limited CPU resources due to the vastness of the web that they are crawling. Because of this, some believe that spiders don't always spider the entire page. So if the first 100 lines of your page are javascript and table tags, the spider thinks your page is about nothing relevant, and hence you don't rank well. Separating the content solves this problem.

buckworks

12:16 am on Jun 28, 2002 (gmt 0)

When you have well-behaved, fast-loading pages it's a bit easier to drum up links from other sites.

So over time, having code that validates, using CSS wisely, etc, should have positive effects even if Google didn't have an opinion about them directly.

WebGuerrilla

6:15 am on Jun 28, 2002 (gmt 0)

Filipe,

It isn't a case of Google awarding you bonus points because your page validates. (I don't think they would start doing that until google.com validates. ):)

The boost you get from clean code is really more of a bi-product of helping a spider do its job correctly. Code that validates dramatically reduces the chances that there will be an error when the document is parsed. A couple mistakes in your code can often cause 50k of Javascript to get stored in the database, instead of your 500 word keyword rich article.

External CSS is even better because removing all the presentation code off the page means that you make it almost impossible for even the dumbest of spiders to make a goof.

ciml

12:47 pm on Jun 28, 2002 (gmt 0)

egomanica:
> Because of this, some believe that spiders don't always spider the entire page...

Googlebot will grab 100k, so I wouldn't worry about it being just the 1st 100 lines.

WebGuerilla:
> Code that validates dramatically reduces the chances that there will be an error when the document is parsed.

Definitely, but Googlebot can trip even on valid HTML such as > appearing in attribute values (at least it did a few weeks ago when last I checked).

dbowers

12:54 am on Jun 29, 2002 (gmt 0)

If Google were to use its indirect influence to nudge site design in a "healthy" direction, there are other social techniques it could use:

Penalize domain names with false or invalid WHOIS information (see Ben Edelman [cyber.law.harvard.edu]'s research and anecdotes)
Drop sites that contain known security vulnerabilities (e.g. sites using old, exploitable FormMail scripts or ones that link to non-SSL versions of credit card processors)
Reward sites that offer alternative versions to reach the disenfranchised; e.g. sites that follow WAI [w3.org] or Section 508 [section508.gov] (USA) accessibility guidelines or sites that natively offer content in multiple languages.

nell

12:44 pm on Jun 29, 2002 (gmt 0)

When AOL goes Netscape a lot of sphagetti code sites will be weeded out. We've been spending considerable time preparing sites for Netscape. Not only cleaning up the hard code, but using color/font/format choices that reproduce the same in both IE and Netscape. To us it's not only a technical issue but one that makes visiting the site a better experience.

electro

5:32 pm on Jun 29, 2002 (gmt 0)

Heh, I like that "sphagetti code" :)

I don't think that a valid page will rank any better than a non-valid-but-still-works-ok page. However, as has been said, if the code is so riddled with errors that the spider doesnt get to the content, you've had it.

Whats the best way to find out if this happens to your page? VALIDATE IT! The W3C validator is great at exposing faults that will kill a spiders attempts to read a page. Plus, valid code is cool :)

Zaccix

5:56 pm on Jun 30, 2002 (gmt 0)

Hypothetically speaking, if Google gave higher placement to sites with validating code, it'd probably start the biggest code cleanup the web has ever seen. Thousands of businesses would rush to ensure their code validates, eager to gain those few extra notches in the listings.

Or maybe not. Either way, it'd be nice to see something to encourage more people, especially businesses, to write cleaner, validating code. Of course, Google itself would have to start validating first, lest they suffer cries of hypocrisy.

pageoneresults

6:09 pm on Jun 30, 2002 (gmt 0)

What do you do when you worked so hard to validate and then find out that a piece of tracking script now causes the site not to validate?

I don't want to remove the tracking script as it is providing very valuable information right now. Everything else on the pages validate with the exception of the <noscript> tag that is used for the tracking script. I now have three errors to contend with that won't allow me to validate 100%. What should I do?

Line 274, column 50:
button5.asp?tagver=5&si=******&fw=0&js=No...

^Error: unknown entity "si"

Line 274, column 50:
button5.asp?tagver=5&si=******&fw=0&js=No&"></noscr...

^Error: unknown entity "fw"

Line 274, column 50:
button5.asp?tagver=5&si=******&fw=0&js=No&"></noscript>

^Error: unknown entity "js"

I've sent an email to technical support explaining the situation to them along with the results of the validation. Is this something that I can fix myself?

mbauser2

8:12 pm on Jun 30, 2002 (gmt 0)

Those errors are because of the unescaped ampersands. Replace & with &, and they'll validate, without causing any major problems. (Browsers are supposed to convert entity references in URIs, so & is "correct"; & is an error that browsers support for backwards-compatibility.)

I've been tweaking tracking and advertising scripts for validation purposes for years, and no tracking or advertising company has ever noticed or complained.

pageoneresults

8:24 pm on Jun 30, 2002 (gmt 0)

mbauser2, you're awesome! Now we validate again. Let's see if it has any adverse effect on the tracking code which I don't see why it should. Thank you very much!

P.S. I'd be willing to bet that this is a common error and most would not know what to do. I think this one deserves a place in the validation thread that is floating somewhere around here.

pageoneresults

3:55 pm on Jul 1, 2002 (gmt 0)

A little off topic, but I wanted to provide a response we received from our statistics provider.

> Unfortunately, JavaScript is not designed to pass HTML compliance. The reason for this is JavaScript is not HTML. Therefore, your HTML code will still pass W3C compliance. The JavaScript on your page should not be tested for W3C compliance since this is only for HTML code.

Our response...

Unfortunately it is not the JavaScript that is failing HTML compliance. It was a <noscript> tag that had unescaped ampersands. We've corrected the issue and will see if it interferes with the tracking code. If it does not, then great, we may decide to promote *************. If it does interfere, then we will find a product that understands the need to write valid HTML and can provide us with error free HTML.

The JavaScript itself does not get checked when validating. I can't believe you would send such an amateur response to the problem. Maybe we'll publish this in our review of the product. I personally would have sought a solution and then responded with a fix.

Filipe

6:12 pm on Jul 1, 2002 (gmt 0)

Oo, harsh.

pageoneresults

6:16 pm on Jul 1, 2002 (gmt 0)

I'll probably end up sending a follow up apology for my somewhat harsh reply. I just found it unacceptable for a company that large to send a response like that.

How can one promote standards when the leaders in the industry don't do it themselves? I guess its up to us to get the message out there!

pageoneresults

9:12 pm on Jul 1, 2002 (gmt 0)

Well, there was no need for the apology follow up, although I'm still going to send one. I just received the reply below. Hey mbauser2, how does it feel to know that you were responsible for a major company seeing the light in validation?

> Hello,

I have looked into this issue and discussed it with the development team. You are correct, the characters in the <noscript> tag can be escaped and should be. I have filed a bug report on it, so that it can be remedied. I apologize for the inconvenience.

Thank you for choosing ********* ****.

********* Delivery Engineer II

g1smd

10:37 pm on Jul 3, 2002 (gmt 0)

I found that old code from TheCounter.com does not validate. Here's a new version that can replace their offering.

Check if your server is c1. or c2. or c3.thecounter.com before using...

<SCRIPT type="text/javascript" language="javascript1.2"></SCRIPT>
<SCRIPT type="text/javascript" language="javascript1.2"></SCRIPT>
<SCRIPT type="text/javascript" language="javascript1.2"></SCRIPT>
<NOSCRIPT><A HREF="http://www.TheCounter.com" TARGET="_top"><IMG
SRC="http://c1.thecounter.com/id=000000000" ALIGN="CENTER"
BORDER="0" ALT="TheCounter"></A></NOSCRIPT>

You will need to change the digits 000000000 to be your own ID number.
There are two separate places in the code where this has to be done.

The word CENTER can be changed to LEFT or RIGHT if you need a different
alignment of the image. There are two places that this needs to be done.

This is a list of the changes:

Old: <SCRIPT><!--
New: <SCRIPT type="text/javascript" language="javascript1.2"><!--

Old: <SCRIPT language="javascript1.2"><!--
New: <SCRIPT type="text/javascript" language="javascript1.2"><!--

Old: <SCRIPT><!--
New: <SCRIPT type="text/javascript" language="javascript1.2"><!--

Old: pr("BORDER=0 SRC=\"http://c1.thecounter.com/id=000000000"+r+"\"></A>")}
New: pr("ALIGN=\"CENTER\" BORDER=\"0\" ALT=\"TheCounter\"")
New: pr("SRC=\"http://c1.thecounter.com/id=000000000"+r+"\"></"+"A>")}

Old: SRC="http://c1.thecounter.com/id=000000000" BORDER=0></A>
New: SRC="http://c1.thecounter.com/id=000000000" ALIGN="CENTER"

Old: </NOSCRIPT>
New: BORDER="0" ALT="TheCounter"></A></NOSCRIPT>

This corrects most of the errors that occur if the code is submitted
to the W3C HTML Validator at: <http://validator.w3.org/>.

Yeah, this also threw up a <NOSCRIPT> problem sometimes.