Googlebot/2.1

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot/2.1

with Mozilla 5.0

directrix

12:02 am on Jun 28, 2005 (gmt 0)

I'm sure this is old news, but for some time now a small proportion of my Googlebot/2.1 log entries have read: "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)". The IP address indicates the access does indeed originate from Google.

What's the significance of this? Googlebot *and* Mozilla/5.0?

balam

1:49 pm on Jun 28, 2005 (gmt 0)

Hey there directrix,

You should check out this thread [webmasterworld.com] for more info on that UA (including comments from GoogleGuy on the subject). If that's not enough, use Google to search WW for "Mozilla/5.0" & "Googlebot" [google.com].

BillyS

11:30 pm on Jun 28, 2005 (gmt 0)

I've been getting spidered by this little critter laterly too. Could be just a test bot as GoogleGuy mentions.

Talking about spiders, I noticed that Google did a deep crawl of my site yesterday. Based on my experience and the time of month, I would expect some kind of update over the weekend - fourth of July and all here in the US...

Bourbon is the wild card. Things just settled down. Should be interesting. That's why I'm here, it's certainly not for the money - yet.

zeus

12:03 am on Jun 29, 2005 (gmt 0)

I have seen this bot for a month now, but I dont think it ads new pages, because it spiders alot of my pages, but only googlebot ads pages as I can see, but I do see on a google IP that more supplemental pages have showed up, so Im not realy sure what this bot does even after reading all about it.

I must say m site was hijacked and now have been trying to get back, but nothing changed yet.

Dijkgraaf

2:49 am on Jun 29, 2005 (gmt 0)

From what I've observed the Mozilla/5.0 (compatible; Googlebot/2.1 .. ) bot seems to be the one that is more daring and will try and follow URL's with multiple parameters and the standard Googlebot/2.1 will only follow URL's with a single parameter.
I've also seen it try fetch a pages based on a string it found in JavaScript or as parameters in other URL's.

The Mozilla/5.0 version does seem to seem to have some problems with 302 redirects. A page that is part of a webring is getting hit multiple times a day and getting a 200 response rather than a 304.

In fact I've not seen Mozilla/5.0 googlebot get a 304 response.
Has anyone seen this bot get a 304 response?

abates

3:49 am on Jun 29, 2005 (gmt 0)

Mozilla/5.0 has been reindexing a lot of old URLs which are 301 on my site... those URLs have been turning up in the search results as "supplimental results". I'm not sure if the two events are related or not, however...

SebastianX

8:09 am on Jun 29, 2005 (gmt 0)

Mozilla-Googlebot also downloads the Google sitemap files, spiders pages from the sitemaps, follows links on these pages to URLs not listed in any sitemap, and requests pages it has never fetched before.

Dijkgraaf

8:34 am on Jun 29, 2005 (gmt 0)

Yes, it sure sounds like it is the new and improved version, although it has a few bugs that need to be ironed out like cache control and handeling 302 redirects.

Dayo_UK

8:36 am on Jun 29, 2005 (gmt 0)

>>>although it has a few bugs

He he - and being able to add pages to the index would be a good one to sort out.

Edit:-

Just to add to my thoughts on this bot - I am starting to think it might be the brains behind the normal Googlebot.

I think it might look for 404s, new urls, might be even the bot that works out the backlinks, page rank - but it does not seem to add pages.

zeus

9:05 am on Jun 29, 2005 (gmt 0)

Before my site got hijackerd and hit by 302 googlebug(3nov.), I never seen mozilla googlebot, first time was in Feb. and now for about a month ago then everyday after that, so it must be some kind of correction bot.

I also only more supplemental result pages from 2004 show up and no real pages, like the googlebot crawls.

tantalus

12:19 pm on Jun 29, 2005 (gmt 0)

At last a discussion about this bot :0

I posted a couple of days ago about this bot grabbing .js files (the first time I've seen this).

This bot has crawling my site on a regular basis since Dec 2004.

My gut feeling tells me this bot looks for CSS spamming techniques such as negative positioning and display none attributes etc, etc.

I also think it looks for for similar javascript techniques or the combination of both.

To put the cat among the pigeons I'd say It may have contributed to some of the 302 hijack problems.

"That spam penalty causes the PageRank of a site to decrease. Since one of the heuristics to pick a canonical site was to take PageRank into account, the declining PageRank of a site was usually the root cause of the problem" googleguy.

As well as others.

>I am starting to think it might be the brains behind the normal Googlebot

I think you are right dayo.

Dijkgraaf
I can't answer your 304 question but thanks for answering my post on the other forum. :)

everthing above IMHO

zeus

1:07 pm on Jun 29, 2005 (gmt 0)

the js I got is my advertising scripts and those I got a not allow robots text on.

I dont think its a specieal javascripts/CSS bot I think it has more to do with the supplemental results, because I have NEVER seen it before I got hijacked, then there where only googlebot, which I dont see that often, only mozilla googlebot

tantalus

1:59 pm on Jun 29, 2005 (gmt 0)

Sorry zeus,

Just to clear things up. I wasn't directly referencing your 302 prob.

The fact that I said this bot is feteching JS files might be misleading too.

"The primary reason for this is that some web servers assume that unless a user agent is IE, Netscape/Mozilla, or maybe Opera, that your browser won't support JavaScript, frames, etc. As Googlebot gets better over time, it gets closer to a regular user and browser in our ability to handle features like that" Google guy.

As simple example, what happens if you have this on your page.

document.write("hello") or document.write(varFromJSFile)

and therfore break google's golden rule of presenting google with different content from what you present the user?

Dayo_UK

2:01 pm on Jun 29, 2005 (gmt 0)

tantalus

There have been other discussion about this bot but no-one seems to know for sure what it does.

I did a longish thread in supporters but no-one seemed intrested.

If I could have gone to New Orleans I would have asked a G engineer about this bot.

zeus

2:22 pm on Jun 29, 2005 (gmt 0)

but why is it first active now late in my hijacking period/302 googlebug and NOT when nothing had happen to the site, so we must get anything logic out of it.

Is the bot only active on filtered sites?
Is the bot only active, when a site has alot of supplemenatl results, because it did ad a lot of supplemental result to my site:mydomain.com, but not lately.

I just try figure out what tis bot does more then JS which does not show in IE or whatever.

tantalus

2:58 pm on Jun 29, 2005 (gmt 0)

"but why is it first active now late in my hijacking period/302 googlebug and NOT when nothing had happen to the site, so we must get anything logic out of it."

It may be that google are staggering a full implementation of the bot.

I noticed it nov/dec 2004 but others had noticed it before that. Take a look at this thread [webmasterworld.com ] particularly pipster2004's post.

"I did a longish thread in supporters but no-one seemed intrested"

I find it odd too and its not only confined to WebmasterWorld the rest of the seo communtity don't seem to care except for a passing interest. It just seems very significant to me.

Going back to what you said about it being the might be "the brains behind the normal Googlebot"... I always had the feeling that mozilla fetched a page before the normal googlebot. So I checked my logs for this month on a deep page:

04/06/2005 23:15:33 GET /example.htm Googlebot/2.1 ( [google.com...]
11/06/2005 12:02:18 GET /example.htm Mozilla/5.0 (compatible; Googlebot/2.1; [google.com...]
14/06/2005 08:17:33 GET /example.htm Mozilla/5.0 (compatible; Googlebot/2.1; [google.com...]
15/06/2005 20:48:31 GET /example.htm Googlebot/2.1 ( [google.com...]
17/06/2005 22:41:10 GET /example.htm Mozilla/5.0 (compatible; Googlebot/2.1; [google.com...]
21/06/2005 20:55:16 GET /example.htm Mozilla/5.0 (compatible; Googlebot/2.1; [google.com...]
22/06/2005 14:09:09 GET /example.htm Mozilla/5.0 (compatible; Googlebot/2.1; [google.com...]
23/06/2005 10:34:28 GET /example.htm Mozilla/5.0 (compatible; Googlebot/2.1; [google.com...]
27/06/2005 20:32:03 GET /example.htm Googlebot/2.1 ( [google.com...]

I don't know if some kind of comparison is involved or if this just signifies the demise of Googlebot/2.1 :o

On a side note I also remember google guy in response to someone asking if they should ban the bot saying he didn't think this was a good idea.

AlexK

3:06 pm on Jun 29, 2005 (gmt 0)

The damn thing has been hitting my site at up to 3 times/sec. So fast that on-site unruly-bot-blocking routines have been triggered. G has agreed to rein it in.

It has hit the site 25,088 times so far this month (comparative figures for the normal G-bot are 1,730 times).

I call it the snippet-eater, since it adds nothing to mysite-SERPs (unlike the normal G-bot) but crawls all over my URL-only pages.

Finally, it is part-responsible for a swollen 404-file, requesting the most fantastic URLs.

If there was a way to discriminate, I would ban this b*stard in the robots.txt.

Dayo_UK

3:13 pm on Jun 29, 2005 (gmt 0)

>>>On a side note I also remember google guy in response to someone asking if they should ban the bot saying he didn't think this was a good idea.

He he - yep that was me :)

Trying to get some info from GG

AlexK

2:05 pm on Jul 2, 2005 (gmt 0)

The damn thing has been hitting my site at up to 3 times/sec.

A slight side-issue, but I asked G to slow this Mozilla/5.0 bot down, and they replied on 28 June "We've reduced the load on your servers". I didn't realise exactly what they meant until I checked the stats for yesterday (1 July) - not only has the Mozzie-bot not crawled at all, but there are only 20 hits from the 'standard' G-Bot (June was an average 60/day).

So, be warned: either let G hit your site as hard and as fast as it wants, or suffer a drought.

I guess I should have expected it. After all, G is very young, and you can hardly expect responsible, measured actions from the young, can you?

DamonHD

6:50 pm on Jul 2, 2005 (gmt 0)

Hi AlexK,

I think that you are being a little harsh.

Many, many years ago when I had a 64kbps leased line to the Net the Googlebot managed to crush my site by having huge numbers of simultaneous connections open for image downloads so that even I could not get in locally, and it was using nearly every drop of Net bandwidth!

I wrote them a polite email asking them to limit the load but NOT stop spidering and they did so almost immediately and I don't think that there has ever been a problem with their bot since. And I have stayed in the SERPS, though there is a much longer lag on images getting in than pages.

Rgds

Damon

AlexK

9:49 am on Jul 3, 2005 (gmt 0)

DamonHD:

I think that you are being a little harsh ... the Googlebot managed to crush my site ... so that even I could not get in locally

Hmmm...

There is an unwritten contract between webmasters and the Search-Engines: we let them run their bots all over our sites (which costs us money) and they give us fresh visitors from their SERPs. My own research (msg#7) [webmasterworld.com] (also msg#59+60 [webmasterworld.com]) indicates that the Mozzie-bot adds nothing to the SERPs, which breaks that contract. Because of that, I'm fine with no visits from that particular bot, yet really pissed-off that the "standard" bot has also reduced it's rate. It has actually slowed even more. Here are comparative figures for the first 2 days of July:

Bot visits from 01-Jul-2005 00.00 to 03-Jul-2005 04:03:-
Inktomi Slurp : 2082+95
Google AdSense : 867+2
OmniExplorer : 516+1
GigaBot : 292+94
MSNBot : 369+10
Grub.org : 51+1
BecomeBot : 31+6
Googlebot HTTP/1.0 : 32+4
findlinks : 31
BSpider : 0+8
Others : 15+12
(Numbers after + are successful hits on "robots.txt" files.)

tantalus

7:09 pm on Jul 3, 2005 (gmt 0)

I thought it might be helpful to link these 2 threads [webmasterworld.com]

Dijkgraaf

11:14 pm on Jul 3, 2005 (gmt 0)

AlexK, I have pages that were only visited by the Mozilla Googlebot, and they are apearing in the search results. Yes they are appearing as "Supplemental Result" whatever that means, as they are appearing at #1 position ahead of other results that aren't "Supplemental Result"
So I've got no beef about the Mozilla Googlebot in that regard.

mrMister

11:20 am on Jul 5, 2005 (gmt 0)

I've recently started logging Googlebot's visits to my site. I'll share the data with you...

<headers>
<IP_ADDRESS>66.249.65.162</IP_ADDRESS>
<TIME>20050705092734</TIME>
<PATH_INFO>**REMOVED**</PATH_INFO>
<HTTP_CONNECTION>Keep-alive</HTTP_CONNECTION>
<HTTP_ACCEPT>*/*</HTTP_ACCEPT>
<HTTP_ACCEPT_ENCODING>gzip</HTTP_ACCEPT_ENCODING>
<HTTP_FROM>googlebot(at)googlebot.com</HTTP_FROM>
<HTTP_HOST>**REMOVED**</HTTP_HOST>
<HTTP_USER_AGENT>Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)</HTTP_USER_AGENT>
</headers>

<headers>
<IP_ADDRESS>66.249.65.162</IP_ADDRESS>
<TIME>20050705103219</TIME>
<PATH_INFO>**REMOVED (but same as above)**</PATH_INFO>
<HTTP_CONNECTION>Keep-alive</HTTP_CONNECTION>
<HTTP_ACCEPT>*/*</HTTP_ACCEPT>
<HTTP_ACCEPT_ENCODING>gzip</HTTP_ACCEPT_ENCODING>
<HTTP_FROM>googlebot(at)googlebot.com</HTTP_FROM>
<HTTP_HOST>**REMOVED**</HTTP_HOST>
<HTTP_USER_AGENT>Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)</HTTP_USER_AGENT>
</headers>

---

The time is in YYYMMDDHHMMSS format. Any clues as to why the same bot from the same IP will grab the same URL twice in just over an hour? Perhaps its checking for dynamically changing content?

AlexK

12:32 pm on Jul 5, 2005 (gmt 0)

Dayo_UK:

I am starting to think it might be the brains behind the normal Googlebot ... it might look for 404s, new urls, might be even the bot that works out the backlinks, page rank - but it does not seem to add pages.

Dijkgraaf:

I have pages that were only visited by the Mozilla Googlebot ... they are appearing as "Supplemental Result"

Tentative analysis:

Main

Supplemental

M_Bot

G_Bot

site:my-site.com

Main

Supplemental

G_Bot

Supplemental

Main

M_Bot

Main

Supplemental

M_Bot

G_Bot

Extra for #2:
None of the results for a site:my-site.com search are marked as "Supplemental" (for my site, anyway). 87% of the 1st 100 SERPs for my-site.com were url-only, yet one of those pages had been visited 11 times in a 5 month period (msg7 [webmasterworld.com]).

Extra for #5:
After the M_Bot stopped hitting my site (28 Jun, msg 19) (June avg: 836/day) the G_Bot has also started to dry up:

Dijkgraaf

10:45 pm on Jul 5, 2005 (gmt 0)

Alex K.
Re #1 Yes that does seem to be the case
Re #2 No, I have supplemental results showing both title and snippets
Re #3 The URL's in question have multiple parameters, and G_Bot only visits parameters with single parameters at this time. So I haven't observed this happening for my supplemental pages.
Re #4 I hope not, that would cause it to bounce back and forwards.
Re #5 Not from what I've seen so far. I think M_Bot is just their Beta version that is eventually going to replace G_Bot.

carrot63

2:19 am on Jul 6, 2005 (gmt 0)

A couple of months ago I decided to stop google presenting a cached version of my pages using the "noarchive" robots meta tag, but changed my mind when I saw that the 'updated' date disappeared with the 'cached' link. Within a couple of hours of loading the pages with the changed meta tag, I was seeing hits almost exclusively from the Mozilla version of Googlebot, whereas normally it would only show up occasionally, most hits being 'normal' Gbot.

After removing the noarchive meta, things gradually wound down to normal, and the Mozilla version is now only an occasional visitor. I assumed that my adding noarchive suggested in some way that I was cloaking, and the Mozilla version was sniffing for obvious discrepancies.

AlexK

12:16 pm on Jul 6, 2005 (gmt 0)

Dijkgraaf:

Re #2 No, I have supplemental results showing both title and snippets

Are these from a 'normal' search, or a site:my-site.com search?

G_Bot only visits parameters with single parameters at this time

Sorry, your info is out of date. Here is a couple of URLs visited on 5 Jul by the G_Bot:

There are literally thousands of other examples in my logs.

The G_Bot does seem to choke at 3 parameters but, interestingly, the M_Bot will handle more than 2:

...and lots of other examples.

Re #4 I hope not, that would cause it to bounce back and forwards.

...which is exactly my experience on a site:my-site.com search.

Dijkgraaf

10:04 pm on Jul 6, 2005 (gmt 0)

AlexK, the Supplemental Results are comming up on a normal search e.g. Keyword1 Keyword2, and are in fact comming in at #1 position in some cases, even though there are other results listed below which are not Supplemental Results.

Ok, I might not have noticed G_Bot visiting two parameter pages as I might not have any, and as you say it seems to be a fairly new development.

Doesn't sounds good that M_Bot and G_Bot are overiding each others results, sounds like a recipe for chaos, in fact that might explain a lot of peoples problems.

AlexK

11:06 pm on Jul 6, 2005 (gmt 0)

Dijkgraaf: what do you get on a site:your-site.tld search? It was my realisation that these (could be) different to a 'normal' search that lead to #2 above. Because of your discoveries of the M_Bot activity on your own site, you could offer valuable confirmation/negation of that tentative analysis.

This 33 message thread spans 2 pages: 33

Googlebot/2.1

with Mozilla 5.0

directrix

balam

BillyS

zeus

Dijkgraaf

abates

SebastianX

Dijkgraaf

Dayo_UK

zeus

tantalus

zeus

tantalus

Dayo_UK

zeus

tantalus

AlexK

Dayo_UK

AlexK

DamonHD

AlexK

tantalus

Dijkgraaf

mrMister

AlexK

Dijkgraaf

carrot63

AlexK

Dijkgraaf

AlexK

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week