Forum Moderators: Robert Charlton & goodroi
text/html,text/plain but occasionally
text/html,text/plain,application/*,*/* This is for the 'old style' Googlebot/2.1 (+http://www.google.com/bot.html), not the new Mozilla one - which I haven't seen yet.
There's no obvious reason for this difference.
While I'm posting, here are some others, which you may find interesting:
text/html, text/plain, application/*
msnbot/1.0 (+http://search.msn.com/msnbot.htm) */*
Mozilla/5.0 (compatible; Yahoo! Slurp; h t t p://help.yahoo.com/help/us/ysearch/slurp) */*
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
text/html, text/plain, appplication/x-shockwave-flash
Mozilla/2.0 (compatible; Ask Jeeves/Teoma)
They obviously can't process every kind of document that exists which is what */* suggests. They'd be mad to be using this bot for their search engine data because it will get all sorts of stuff that it can't process.
Has anyone tried serving these bots some kind of data that it might have a chance of handling (eg. Application/XML) to see if the content gets indexed?