Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google crawlability vs. trivial W3C validation errors

is crawlability a word?

         

tkroll

2:56 am on May 13, 2005 (gmt 0)

10+ Year Member



I've validating my page with the W3C validation tool. I'm wondering if there are "acceptable" errors.

For example, it is complaining about align=absmiddle for img tags. Without this, my site looks off in IE. Will leaving this trip up Google, etc.? What about not having an alt tag on every image?

Shouldn't a good parser handle these without a hitch? Thanks.

RonPK

11:05 am on May 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google doesn't care about valid HTML when crawling your site - unless you really make a mess of your hyperlinks. Invalid HTML may have some impact on indexing, as it may make it harder to tell what parts of the text are more important than others.

The alt-tag is only relevant for Google if the image is within a hyperlink: in that case it is seen as a replacement for the anchor text. Disclaimer: no one but G knows for sure how much weight alt-text gets.

benihana

11:07 am on May 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



you should use the alt attribute irrespective of what google thinks. Its for other purposes than SEO. If an image is decor only, use alt=""

whitehatwizard

11:15 am on May 13, 2005 (gmt 0)

10+ Year Member



I have corrected HTML errors (identified by W3G validator) a number of times on my sites but have never noticed any Google benefit from doing so. But it always makes me feel better about the site.

tkroll

11:47 am on May 13, 2005 (gmt 0)

10+ Year Member



benihana,

Just curious what other uses you mean. As it adds no semantic value for, say, accessibility, what use could it have?

Asking out of honest curiosity, not being a wiseass.

benihana

11:50 am on May 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



no semantic value for, say, accessibility

thats exactly what i mean. screen readers, and people who have images turned off

mrMister

12:03 pm on May 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



no semantic value for, say, accessibility

thats exactly what i mean. screen readers, and people who have images turned off

I think the poster was refering to you putting an empty string in the alt tag. That has no use to a screen reader.

Truth is, it depends on what version of HTML you are using.

If you're using 3.2 then you shouldn't put an empty alt attribute in your code if your image doesn't convey any information (eg. a blank 1x1 gif used for user tracking).

If you're using HTML4.01 or XHTML1.0 or 1.1 then you should give all alt tags an attribute.

The reason is, that in theory it will make parsers quicker. If an HTML parser knows that it is dealing with valid HTML that complies with the standards, then it doesn't need to spend extra processing power trying to work out whether you have used an alt tag or not. Because HTML4.01 requires you to have an alt attribute, it knows that one will be there.

This is all theoretical. Most parsers will check anyway because there are so many invalid documents about. However this is likely to change as more people start using browsers that support XHTML (at the moment, IE does not). You should get in to the habit now.

However, most web sites can be put together using HTML 3.2, so if you're using HTML3.2, there's no need to put alt attributes for images that convey no meaning.

[edited by: mrMister at 12:03 pm (utc) on May 13, 2005]

tkroll

12:03 pm on May 13, 2005 (gmt 0)

10+ Year Member



Yes, but a blank alt tag does not help with the screen reader. Maybe the screen reader even says, "blank" or "nothing". How annoying and confusing would that be?

They should add an alt attribute to CSS!

benihana

12:06 pm on May 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



with no alt attribute, most screenreaders will say the word 'image'. With an empty alt attribute , generally they say nothing.

If you have a number of decorativce images on a page, is it better to hear: 'image image image image image image ', or just have those non-informational images ignored, and get straight to the content?

<rant>please, can we get the terminology right. there is no such thing as an alt tag, its an attribute. Pedantic, maybe, but this is supposed to be professional level discussion, and basic terminology such as this should be a no-brainer</rant>

[edited by: benihana at 12:10 pm (utc) on May 13, 2005]

tkroll

12:08 pm on May 13, 2005 (gmt 0)

10+ Year Member



Very cool. That was the information I was interesed in.

I will happily add the alt=''. Thanks.

mrMister

12:25 pm on May 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



please, can we get the terminology right. there is no such thing as an alt tag, its an attribute. Pedantic

I prefer to use the original poster's terminology to reduce the risk of confusion.

[edited by: mrMister at 12:28 pm (utc) on May 13, 2005]

[edited by: lawman at 3:33 pm (utc) on May 13, 2005]

WebWalla

12:28 pm on May 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The alt-tag is only relevant for Google if the image is within a hyperlink

That was the situation a few months ago, but now searches for text in an ALT attribute will bring up results whether the image forms part of a link or not.

benihana

12:37 pm on May 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




I prefer to use the original poster's terminology to reduce the risk of confusion.

in which part of your post, as you seem to mix and match?

ALT is not a tag, and to avoid confusion we should use the correct terminology, as is fitting to a professional discussion, which:

you should give all alt tags an attribute

is obviously an issue for you.

[edited by: lawman at 3:34 pm (utc) on May 13, 2005]

mrMister

12:56 pm on May 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



is obviously an issue for you.

It's called a typo. replace alt with image and problem solved.

[edited by: lawman at 3:36 pm (utc) on May 13, 2005]

mrMister

1:02 pm on May 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Adding attributes to alt tags is required in HTML4.01, XHTML1.0 and XHTML1.1. Your pages aren't compliant without it, and could potentially cause problems for a strict HTML parser.

It's also a benefit for accessibility purposes including screen readers.

AFAIK, Google won't have any problems parsing an HTML document that doesn't have alt attributes on all the images.

[edited by: lawman at 3:37 pm (utc) on May 13, 2005]

trillianjedi

1:09 pm on May 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Many of the top authority websites on the internet don't validate (check out the BBC's homepage) so it's not something that Google, or any other engine, could currently either weight or have their crawlers trip up over.

That may change, of course.

In any event, it's a process of good design and useability that dictates that ideally your pages should validate.

TJ

nickied

1:56 pm on May 13, 2005 (gmt 0)

10+ Year Member



tkroll:

I've validating my page with the W3C validation tool. I'm wondering if there are "acceptable" errors.

imho, there are no "acceptable" errors. though as already pointed out, for reasons other than google.

For example, it is complaining about align=absmiddle for img tags.

Depending on dtd you're using, absmiddle may be deprecated, for example:

HTML 4.01 Transitional:

value of attribute "ALIGN" cannot be "ABSMIDDLE"; must be one of "TOP", "MIDDLE", "BOTTOM", "LEFT", "RIGHT"

Will leaving this trip up Google, etc.?

probably not, but again, there are other reasons for not using it, cross platform/browser/version readability.

. . . Without this, my site looks off in IE.

can you try something along the lines of css: vertical-align:middle instead? as in this spec:

[w3.org...]

What about not having an alt tag on every image?

should be included, at least alt="*", not alt=""

Since we can't include private url's here, try googling:

"Use of ALT texts in IMGs" (in quotes) and check the 1st or 3rd entry. Take a look at the "howlers" at the end of the intro. (not my sites)

I'm not even going to get into the "it's a tag, no its an attribute" argument. And at this point it's probably time to move this over to an html/css forum.

benihana

1:58 pm on May 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



least alt="*", not alt=""

id say if its a non-informational image, its better to use alt="", as screenreaders can ignore that completely. If you use alt="*" it may get read out, which is not desirable.

tkroll

2:58 pm on May 13, 2005 (gmt 0)

10+ Year Member



nickied,

Thanks for the thoughtful answer. I usually agree with "there are no acceptable errors," but sometimes an error isn't really an error.

RE: absmiddle
vertical-align seems to have its own compatibility issues. I will find what fits best. I'll choose the user experience over 100% validation if the SEs won't choke.

Thanks for the insight everyone. Have fun with the squabbling!

encyclo

3:22 pm on May 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If we get back to the original question... the crawlability of a page doesn't depend on 100% validation, but there are different types of errors which can have an effect.

1) Missing or invalid attributes

Examples of this category are: missing

alt
attributes, depreciated, invalid or non-standard attributes such
bordercolorlight
or
leftmargin
.

Such errors are very unlikely to cause problems with crawlability. Googlebot doesn't care about border colors, page margins or image alignment. A missing

alt
attribute might have an accessibility impact but it won't affect the parsing of the page by a spider. This doesn't mean you shouldn't fix them, but do it for other reasons, not for Googlebot. Same goes for a missing doctype.

2) Nesting errors

Example:

<b><p>text</b></p>
or similar. Depending on the type or severity, you might cause problems, including the splitting of snetences and the disassociation of text chunks. Again it shouldn't affect pure crawlability, but it can theoretically harm ranking.

3) Parsing errors

Example:

<span="whatever"your text here</span>
or other missing tags or brackets. This offers the most serious challenge to a parser, and can cause serious crawlability problems for spiders. They may not be page-breakers in modern browsers which cope well with severely broken markup (they have great recovery handling, IE in particular) but can make a simpler parser such as a spider choke and fail to read the page. A missing
</head>
can be fatal, for example. These problems should be fixed immediately.

mrMister

3:36 pm on May 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Couldn't agree with you more encyclo. Points 3 & 2 are the most important to deal with when it comes to search engine crawlers.

Of course, if your HTML is always perfect. You'll have no problems at all (except maybe with IE ;-) )

Reid

6:06 am on May 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For example, it is complaining about align=absmiddle for img tags. Without this, my site looks off in IE. Will leaving this trip up Google, etc.? What about not having an alt tag on every image?

Try 'poodle predictor' on your site, it will show how the googlebot likely handles those errors.
you also may be surprised by some other errors that actually validate.

the alt attribute is a perfect hole to plug a keyword or 2 into so I would use the opportunity unless it would be pointless to put anything (a navbar background or something)

tkroll

6:21 am on May 14, 2005 (gmt 0)

10+ Year Member



It is a one-pixel clear spacer. No point IMHO. Thanks.

g1smd

12:24 pm on May 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> Adding attributes to alt tags is required...

Let's get this terminology right. The alt attribute is required on the <img> tag. The minimum allowed for the attribute value is "" (i.e. empty).

Use an empty attribute value on spacers and page decoration. You only need to fill the text in for content images and navigation buttons.

.

Whether something is a tag or an attribute is important; otherwise you will be left behind when the discussion moves on to <title> tags and title attributes.

g1smd

12:34 pm on May 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As for broken HTML that could have been easily fixed, try this search [google.com].