The Mozilla/5.0 Google bot - What does it do - Google Search and SEO forum at WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

The Mozilla/5.0 Google bot - What does it do

What's the reason the Mozilla/5.0 Spider from Google spiders around?

ReSiever

3:14 pm on Mar 15, 2005 (gmt 0)

Hi everybody..

I know their have been a few topics about this one, but i can't figure out what the Mozilla/5.0 Google spider exactly does..

Everything that gets spidered is not getting indexed, so, it's not a regular spider. It's meant specifically for something else. Cloaking detection could be one, but i can't see why they would build a whole new spider just for the detection of cloaked pages. Next to that, even if SEO's cloak pages, they would know by now their is a spider going around detecting it, so the cloaking would be done differently, undetectable.

Maybe the spider is used for analysing content of pages? This could be for future use (LSA?), to detect duplicate content or maybe to see if pages in a linkstructure contain mostly the same content..

I wonder what your thoughts are, your findings etc, cause this spider seems smarter to me then we might think it is...

mrMister

9:46 pm on Mar 17, 2005 (gmt 0)

People have noticed that it crawls URLS in length order form shorter or nearer.

I'm pretty sure that I read something from Google (possibly GoogleGuy) saying that it had something to do with Javascript. His explanation sounded very dubious if I remember, like it was some kind of cover story. Unfortunately i can't find the link.

I don't believe that it grabs .js files, So I can't see it being a fully javascript compatible web crawler.

I suspect it has something to do with gathering data for a forthcoming major update to Google.

ReSiever

8:20 am on Mar 18, 2005 (gmt 0)

I suspect it has something to do with gathering data for a forthcoming major update to Google.

Could well be. I also haven't seen it requesting .js files, so im not worried about that. It follows links between <script></script> tags, but the normal Googlebot also does that, so thats not something to worrie about.

But i also suspect it's analysing content much more then the normal Googlebot. I have a few pages behind my site which don't have much content on it, but are used as seopages. No bad things in it, just optimised for some keywords.

I had the pages visited by the Mozilla spider in januari and actually thats the last visit i ever had by Google. Maybe it determines the 'usefullness' for the actual spider, based on the content it analyses?

macdave

11:12 am on Mar 18, 2005 (gmt 0)

A couple random things I've noticed about the Mozilla bot's behavior:

- It follows most redirects (at least on-site 301s) almost immediately
- It doesn't seem to remember those redirects (e.g. there are a number of redirected URLs on our site that it's re-visiting almost daily)
- It's not feeding the live index/cache

xrtza

12:12 pm on Mar 18, 2005 (gmt 0)

One of the first things I noticed about this bot is that it not only ignored robots.txt it actually goes after everthing listed as disallow first. I have also noticed it comes in on the same IP as googlebot-image. Which I suspect is not just intended to find images as the name would indicate. This is definitely some sort of analysis bot. On a site I had drop from Google, and then got crawled it appeared to me that the mozilla googlebot has to "approve" of pages before the normal bot will index. The site is php dynamic and googlebot-image appeared to add images names as parameters to queries. This is perhaps a method of determining where to begin ignoring text strings as queries. An email to google stated it was simply following links. This is absolutely untrue and I don't like being lied to. They are doing things with these bots but want it to be a secret. A little sneaky if you ask me but not nessasarily sinister.

BillyS

1:50 pm on Mar 18, 2005 (gmt 0)

Another theory about this bot is that it checks for cloaking. I've got a site that I am constantly working on, changing pages around a bit. I'm really done with all the SEO changes that I wanted to make and the site has been in the sandbox for over 8 months. I've decided to let the site "stabilize" a bit.

I half wonder if part of my problem is related to this little bot that picks up the constant changes I had been making. Perhaps kicking in some kind of cloaking penalty for the site. Not sure, but I have not read a good explantion for its visits.

plumsauce

6:08 pm on Mar 21, 2005 (gmt 0)

it not only ignored robots.txt it actually goes after everthing listed as disallow first

what?

any other confirmations of this?

doe this not go against the spririt of "do no evil"?

+++

voltrader

11:55 pm on Mar 21, 2005 (gmt 0)

I have over 1,400 links in Y but zero in Google SERPS. That said, the Mozilla/5.0 Bot visits my site all the time.

I too have a dynamic homepage which changes as content is added.

It has no presence in G SERPS, even though I'm linked to from a PR5 site, have AdSense on the site and active AdW campaigns running. My site has been online in earnest and constantly worked on for about 5 months.

I have a feeling that this particular Bot is crawling my site because of the changing links on the homepage and the ongoing refinement.

In the last couple of days, I've made some changes in how those links are archived. I hope that'll help it get crawled and listed more effectively.

kryton

12:59 am on Mar 23, 2005 (gmt 0)

I've checked my logs and that "Mozilla Googlebot" has not visited any pages that I have banned in robots.txt. I doubt Google would do such a thing? Maybe it was someone else just using the same UserAgent? I would be interested to know if anyone else reports it visiting banned pages in robots.txt.

ReSiever

10:10 am on Mar 23, 2005 (gmt 0)

I've only read about Google ignoring robots.txt, never saw it though. Anyway, it's a smart spider, cause it seems to keep my pages out of Google. Pages with white seo techniques, nothing wrong with it, just not much content on it.

mrMister

10:17 am on Mar 23, 2005 (gmt 0)

it not only ignored robots.txt it actually goes after everthing listed as disallow first

what?
any other confirmations of this?
doe this not go against the spririt of "do no evil"?

Almost always when someone posts that Googlebot isn't obeying robots.txt, it's one of the following.

1. The robots.txt is incorrect in some way.

2. It wasn't Google, it was a forged useragent and the site owner didn't bother to reverseDNS the IP.