Forum Moderators: open
[webmasterworld.com...]
[webmasterworld.com...]
However, I found both identifiers (googlebot.com and google.com/bot) in my logs. They are visiting my website at the same time. Are they using both for different reasons?
Last 30 days:
60% - google.com/bot.html
40% - googlebot.com/bot.html
Last 10 days:
70% - google.com/bot.html
30% - googlebot.com/bot.html
As far as HTTP header, the accept type's I've recorded are as follows for the two bots:
google.com/bot.html
text/html,text/plain
text/html,text/plain,application/*
googlebot.com/bot.html
text/html,text/plain
text/html,text/plain,application/*
text/html,text/plain,application/xml,text/xml,application/atom+xml
Other than that, I've not noticed any major difference.
Peace,
Kaz
Would it make sense to deliver XHTML files as application/xml instead of application/xhtml+xml to the bot?
Googlebot can't handle
application/xhtml+xml at all - it can't read or cache the page, and if it gets in the SERPS at all it is marked as "Filetype unknown". application/xml may seen like a better idea, but Googlebot then can't identify the semantics of an XHTML page - as it only sees generic XML, then tags like <h1>, <h2>, etc. have no weight - the page is parsed as plain text. text/xml is evil anyway, so we can't use that. The only real way of getting the files parsed correctly is to continue to send them as
text/html. application/atom+xml
Now, that's interesting... Googlebot likes Atom, but not RSS - which is much more widespread.