|XHTML / Google victim|
Google won't play with standards based code?
| 11:02 pm on Aug 29, 2004 (gmt 0)|
After rewriting my site for XHTML1.1 compliance, I have been dropped twice from Google listings.
My site now runs browser-sniffing code to serve XHTML1.1 to compatible browsers and HTML4 to others, in accordance with W3C guideleines, but this has served only to cripple my chances of getting (and staying) properly listed.
After the first drop, I has assumed that the Google spiders were having trouble with the XHTML code and decided to check for the google bot and serve it the HTML4 compatible version of the site's pages. After some time, the site was re-indexed, but only the domain name was present in the results listings. Now, about a month later, it has been dropped yet again.
I can only assume now that the detect code in the site is being incorrectly detected by Google as a method of 'cloaking'.
Is returning to HTML4-only code the only way forward?
| 11:18 pm on Aug 29, 2004 (gmt 0)|
D3mon, were you serving your XHTML 1.1 with the
application/xhtml+xml mime-type, or were you sticking with
text/html for everyone and just switching doctypes? Which version were you serving to Googlebot, and with which mime type? As Googlebot doesn't understand
application/xhtml+xml, if you were serving that to the bot, then it would certainly explain some of your problems.
Certainly using HTML 4 is a safe route, but you need to bear in mind that your indexing problems may have absolutely nothing to do with this issue at all. However, what you have been doing is a form of cloaking, which should be avoided if only to reduce the number of factors which risk influencing your position in the SERPs.
The W3C has never recommended browser-sniffing and the serving of different versions based on browser identification: the XHTML profiles are backwards-compatible when served as
text/html. You can choose the standard you prefer between XHTML and HTML 4 - if you are sticking with
text/html, there is no real-world difference for the user agent (aka browser) between the two.
| 11:33 pm on Aug 29, 2004 (gmt 0)|
Yes, the HTTP ACCEPT is checked to determine the Q values for both text/html and application/xhtml+xml and the preferred type (with the higher Q value) is served with the appropriate mime type. Until this time, my hard work in XHTML was perceived by most browsers as a 'tag-soup'. In each case, the appropriate DTD is supplied.
My understanding was the W3C do not recommend sending XHTML1.1 with the text/html mime type.
The GoogleBot override in my code detects 'google' in the UA and defaults to HTML4 with the text/html mime type.
While I've just about got my head around this mime-type madness, I fear this additional issue with Google may just halt my further progress into standards-compatible code.
| 11:52 pm on Aug 29, 2004 (gmt 0)|
Googlebot doesn't send HTTP_ACCEPT, so what was served up as default? As I said, if it was
application/xhtml+xml, then you would have been dropped by Google as the bot would have been unable to parse the pages.
|My understanding was the W3C do not recommend sending XHTML1.1 with the text/html mime type. |
True. But XHTML 1.1 is a special case, and the situation with it is nebulous to say the least. In a few other threads here I've said that it is never a good idea to use XHTML 1.1, although you may use XHTML 1.0, either strict or transitional as required. Even the W3C don't use 1.1. XHTML 1.0 may be served as
text/html, even though
application/xhtml+xml is recommended.
|I fear this additional issue with Google may just halt my further progress into standards-compatible code. |
But standards-compliance is only in part to do with XHTML. HTML 4.01 Transitional is just as valid as a web standard as XHTML 1.0 Strict - you just need to use the most appropriate tool for the job. Validation also, although undeniably important, is not everything either: factors such as accessibility are very important too. You can have a valid XHTML page which is completely inaccessible - which is much worse than an accessible tag soup page.
Oh, and finally, a healthy dose of pragmatism is very important too: after all, you are also trying to make money with your site. Serving your site (I found it) as
application/xhtml+xml when I visit with Firefox is admirable, but because of the mime type, your Adsense panel doesn't show. So, I get a very slick, clean, validated XHTML document, but you've not presented me with your advertizing, so you've lost a chance to make money out of my visit.
| 12:12 am on Aug 30, 2004 (gmt 0)|
Without a valid HTTP_ACCEPT, then the default would be HTML4.01.
Adsense took a hit too after the changes, although probably, in the most part, due to the lack of traffic.
It looks as though I've perhaps stepped a little too close up to the 'cutting edge' and in doing so, cut off the very support that kept the site bouyant.
I'd like to think I'm quite attentive to accessibility and validation so I think my future effort in the site will be focussed in these areas.
On a final note, what might be the cause of Google only listing the domain name of a site and not creating the description, cache and backlinks etc.?
Thank you for your help. It is very much appreciated.
| 12:45 am on Aug 30, 2004 (gmt 0)|
If there are any of your pages indexed which still has a cache you can check this (and the source code) to see what Google actually got served. I had a similar problem with browser sniffing where all browsers were served correctly but due to a logic error on my part SE's only received part of the page. My pages began to show up as urls only in Google just like yours until I corrected the logic.