Forum Moderators: open

Message Too Old, No Replies

Return of googlebot.com

not google.com/bot

         

yowza

10:43 pm on Jul 26, 2004 (gmt 0)

10+ Year Member



There has been talk of Googlebot's switch to the identifier google.com/bot in place of googlebot.com/bot for a couple of months now as seen by these threads:

[webmasterworld.com...]
[webmasterworld.com...]

However, I found both identifiers (googlebot.com and google.com/bot) in my logs. They are visiting my website at the same time. Are they using both for different reasons?

yowza

4:37 pm on Jul 31, 2004 (gmt 0)

10+ Year Member



Both Google UA's are still visiting my site, although google.com/bot is much more busy than googlebot.com/bot.

Anybody know what they are using both ua's for? GoogleGuy mentioned in an earlier thread that they were switching to google.com/bot, but it looks like they decided to use both.

directrix

12:20 am on Aug 1, 2004 (gmt 0)

10+ Year Member



Apart from a small number of visits on 21st July, I've not seen googlebot.com in my logs since 13th July.

kazonik

5:22 am on Aug 1, 2004 (gmt 0)

10+ Year Member



Over the past 30 days, monitoring several servers, Im seeing activity from both robots every day with a distribution of:

Last 30 days:
60% - google.com/bot.html
40% - googlebot.com/bot.html

Last 10 days:
70% - google.com/bot.html
30% - googlebot.com/bot.html

As far as HTTP header, the accept type's I've recorded are as follows for the two bots:

google.com/bot.html
text/html,text/plain
text/html,text/plain,application/*

googlebot.com/bot.html
text/html,text/plain
text/html,text/plain,application/*
text/html,text/plain,application/xml,text/xml,application/atom+xml

Other than that, I've not noticed any major difference.

Peace,
Kaz

bull

5:56 am on Aug 1, 2004 (gmt 0)

10+ Year Member



text/html,text/plain,application/xml,text/xml,application/atom+xml

Has it been always like this?

kazonik

3:40 pm on Aug 1, 2004 (gmt 0)

10+ Year Member



The first instance I can find of this HTTP header:

text/html,text/plain,application/xml,text/xml,application/atom+xml

...is on 2004-07-02

I've been monitoring since roughly the beginning of 2004.

Peace,
Kaz

bull

3:50 pm on Aug 1, 2004 (gmt 0)

10+ Year Member



Thank you. How do you monitor this?
Would it make sense to deliver XHTML files as application/xml instead of application/xhtml+xml to the bot?

encyclo

4:40 pm on Aug 1, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Would it make sense to deliver XHTML files as application/xml instead of application/xhtml+xml to the bot?

Googlebot can't handle

application/xhtml+xml
at all - it can't read or cache the page, and if it gets in the SERPS at all it is marked as "Filetype unknown".

application/xml
may seen like a better idea, but Googlebot then can't identify the semantics of an XHTML page - as it only sees generic XML, then tags like
<h1>
,
<h2>
, etc. have no weight - the page is parsed as plain text.
text/xml
is evil anyway, so we can't use that.

The only real way of getting the files parsed correctly is to continue to send them as

text/html
.

application/atom+xml

Now, that's interesting... Googlebot likes Atom, but not RSS - which is much more widespread.