Google HTTP accept

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google HTTP accept

Observation on http accept

ajparty

3:29 pm on Mar 29, 2005 (gmt 0)

This may not be of interest to anyone except hardcore goolge obsessives :) ... the http accept header on the googlebot (from my observation) is usually:

text/html,text/plain

but occasionally

text/html,text/plain,application/*,*/*

This is for the 'old style' Googlebot/2.1 (+http://www.google.com/bot.html), not the new Mozilla one - which I haven't seen yet.

There's no obvious reason for this difference.

While I'm posting, here are some others, which you may find interesting:

text/html, text/plain, application/*
msnbot/1.0 (+http://search.msn.com/msnbot.htm)

*/*
Mozilla/5.0 (compatible; Yahoo! Slurp; h t t p://help.yahoo.com/help/us/ysearch/slurp)

*/*
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)


text/html, text/plain, appplication/x-shockwave-flash
Mozilla/2.0 (compatible; Ask Jeeves/Teoma)

mrMister

2:32 am on Mar 31, 2005 (gmt 0)

I think they're just playing about with detection for cloaking.

They obviously can't process every kind of document that exists which is what */* suggests. They'd be mad to be using this bot for their search engine data because it will get all sorts of stuff that it can't process.

Has anyone tried serving these bots some kind of data that it might have a chance of handling (eg. Application/XML) to see if the content gets indexed?