homepage Welcome to WebmasterWorld Guest from 54.198.148.191
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 33 message thread spans 2 pages: 33 ( [1] 2 > >     
Beware of Asking Google to Slow Their Mozilla Compatible Robot Down?
AlexK




msg:774193
 1:56 pm on Jul 6, 2005 (gmt 0)

On 28 June G told me "We've reduced the load on your servers".

The M_Bot [webmasterworld.com] (identified by Mozilla/5.0 (compatible; Googlebot/2.1; in the referer string) had been hitting my site at upto 3 times/sec (avg: 836/day), triggering unruly-bot-prevention routines [webmasterworld.com] in the PHP-scripts. I had used the G on-site form [google.com] a few days before, and asked them (nicely) to stop it. So far so good.

By July 1 it was clear that--rather than turning the knob down a notch or two--they had switched it off altogether, at least as far as the M_Bot was concerned - there were no hits from this bot at all. The G_Bot (identified just by Googlebot/2.1 (+http://www.google.com/bot.html) in the referer string) had also slowed:

    June avg: 60/day
    Jul 1: 12
    Jul 2: 18
    Jul 3: 5
    Jul 4: 17
    Jul 5: 29

So, my site has now dropped from 28,000 G-hits in June to a likely 500 in July. So, beware...

 

mrMister




msg:774194
 2:14 pm on Jul 6, 2005 (gmt 0)

Yes, Google have a history of giving <people> exactly what they ask for.

Take a look at the first post (msg#521) here for an example...

[webmasterworld.com...]

Personally, I think its great that Google have a sense of humour.

As the old adage goes...

"Be careful what you wish for... it may come true!"

[edited by: lawman at 8:45 pm (utc) on July 6, 2005]

AlexK




msg:774195
 8:30 pm on Jul 6, 2005 (gmt 0)

mrMister:
Personally, I think its great that Google have a sense of humour.

Do you know Mel Brooks' 2,000-year-old man [en.wikipedia.org]-derived definition of humour?

Humour:- a sabre-toothed tiger enters the cave, drags out your neighbour, and eats him.
Tragedy:- You stub your toe on a rock.

It's always funny when it happens to others.

<snip> I think that I may be due my shot of humour. I am willing to wait.

[edited by: lawman at 8:46 pm (utc) on July 6, 2005]

lawman




msg:774196
 8:44 pm on Jul 6, 2005 (gmt 0)

If we can all abide by TOS #4 (be respectful of other members), then I'm sure I won't have to edit anyone.

AlexK




msg:774197
 11:20 pm on Jul 6, 2005 (gmt 0)

Update:
Google have told me "We can't guarantee that your site will be crawled at any particular frequency".

Whilst at first sight this does seem to make a nonsense of G's "Googlebot is overloading my servers" [google.com] page, on further consideration it suggests that there really is only an on/off switch, rather than a graduated knob.

The more I think about this, the more important it seems to know just what is the case.

Chico_Loco




msg:774198
 11:36 pm on Jul 6, 2005 (gmt 0)

Doesn't google abide by the "crawl-delay" parameter which can be put in robots.txt.

This was probably the better solution.

sit2510




msg:774199
 7:55 am on Jul 7, 2005 (gmt 0)

IMO, we can trust that Googlebot has good behavior in crawling the World Wide Web. In rare occasion it may run wild but I would not bother in asking Google to slow the bots down as it is more prudent to have it fixing itself in natural way. In all cases, overloading is only temporarily in a very short term and things then return normal. No hassle.

BReflection




msg:774200
 5:26 am on Jul 8, 2005 (gmt 0)

Google has commented in the past that they would crawl harder if it weren't for the smaller webmasters complaining. That said, they probably don't like to hear you complaining ;)

victor




msg:774201
 6:58 am on Jul 8, 2005 (gmt 0)

Google has commented in the past that they would crawl harder if it weren't for the smaller webmasters complaining.

Google could solve that problem in 5 minutes by looking for a crawl-delay: 0 for googlebot in robots.txt.

That would be treated as permission for crawling at high speed.

If they did that, and let it be known that they were doing that, webmasters could decide to add that or not.

It's not a matter of webmasters' complaining. It's to do with secondary web activities (like index builders) honoring the wishes of the primary movers (those providing content).

AlexK




msg:774202
 1:20 pm on Jul 9, 2005 (gmt 0)

Here is a compendium of past intelligence on the M_Bot, compiled whilst searching WebmasterWorld for info on the crawl-delay parameter. It is all there if you look for it...

...and this is the first sighting of this bot [webmasterworld.com].

AlexK




msg:774203
 6:01 pm on Aug 8, 2005 (gmt 0)

Now that July is finished, here is an update to the stats. First though, these are the timings of the emails:
    June 26: (me to Google) "Please stop it."
    June 28: (Google to me) "We've reduced the load on your servers"
    July 3: (me to Google) "Your bots now do not crawl my site at all."
    July 6: (Google to me) "we see no cause for concern at this time"

...and here are comparative stats for the last 3 months:

July: ............................Pages
Inktomi Slurp ...................24,065
Google AdSense ..................15,972
MSNBot ..........................11,990
Googlebot HTTP/1.0 .................866
Googlebot HTTP/1.1 Mozilla/5.0 ......61
.
June:
Googlebot HTTP/1.1 Mozilla/5.0 ..25,089
MSNBot ..........................19,211
Google AdSense ..................15,424
Inktomi Slurp ...................14,236
Googlebot HTTP/1.0 ...............1,801
.
May:
MSNBot ..........................20,475
Google AdSense ..................14,193
Inktomi Slurp ...................11,654
Googlebot (HTTP/1.0 + HTTP/1.1) ..4,409

As the man said, be careful what you ask for, you might just get it.

walkman




msg:774204
 4:20 pm on Aug 9, 2005 (gmt 0)

Googlebot is "killing" me too, but I dont mind it. For example, on a 1200 page site, I've had 4000+ visits today. ALL my outbounds links are with redirect, so they are part of the 4000.

I see GB getting the same file 10-15 times a day sometimes (since sitemaps came in existence even though I have the frequency on "daily") but it's much better for me. Now my pages get indexed within 2-3 days and I'm getting targeted traffic right away.

thanks for the warning though

koan




msg:774205
 7:39 pm on Aug 9, 2005 (gmt 0)

I have this problem also, I just received a notification by my anti-abuse script that the mozilla compatible bot from Google triggered it and was automatically blocked from reading further. Now I just hope it won't penalizes my site for it. Damn, this script is there for unruly site leechers, not real web indexers. I never expected Google to behave like that.

jomaxx




msg:774206
 11:28 pm on Aug 9, 2005 (gmt 0)

AlexK, did you mention how many pages your site contains to be spidered? Hundreds? Thousands?

my3cents




msg:774207
 1:53 pm on Aug 10, 2005 (gmt 0)

I am seeing very strange things from the 66.249.65.X bots

Every request is putting double slashes, though I cannot find any links with double slashes. many of these are deep internal pages that are unlikely to have any external inbound links. I've checked for internal links with double slashes and there are none.

example: [domain.com...]

"GET //pagename.html HTTP/1.1" 200 21531 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

I am getting hundreds of these a day, if I go to the url, it's a 404, but I have a spider tracking script on these pages, and it is showing the pages as being spidered. The only way the spider tracking program could show it is if the page loads, but the page will not load with double slashes.

anyone seeing this in their logs or have a clue what may be happening?

Andem




msg:774208
 4:07 pm on Aug 10, 2005 (gmt 0)

>>>I am seeing very strange things from the 66.249.65.X bots

My site was (?) banned from Google recently in the July 28 changes, and I'm still working to get back into the index. I am see 66.249.65.#*$! crawling this site, several thousand requests per day... and then stopping. The next day, they're at it again.

I don't know exactly what it's for, but I am used to seeing "Googlebot/2.1 (+http://www.googlebot.com/bot.html)" and not "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

What's going on.

AlexK




msg:774209
 8:04 pm on Aug 10, 2005 (gmt 0)

jomaxx:
did you mention how many pages your site contains to be spidered? Hundreds? Thousands?

> ten thousand. 8,237 different pages were viewed in July (by humans, not robots).

walkman:
...sitemaps...

That's the secret weapon that I'm keeping in reserve if nothing changes.

koan:
...notification by my anti-abuse script that the mozilla compatible bot from Google triggered it and was automatically blocked from reading further.

The precise incident on my site that caused this whole thread. Seems that it hasn't changed its ways one jot.

walkman




msg:774210
 10:48 pm on Aug 10, 2005 (gmt 0)

>> That's the secret weapon that I'm keeping in reserve if nothing changes

no need to hold it in reserve at all. Go right ahead and use them.

AlexK




msg:774211
 11:20 pm on Aug 10, 2005 (gmt 0)

My concern is with a 10,000+ page site - the sitemap will be huge. Plus the time to code it into a dynamic site. Not impossible, of course, but that time can be used to do more important things.

It's on my list of "things to do soon".

jomaxx




msg:774212
 5:22 am on Aug 11, 2005 (gmt 0)

Creating the site map can be very simple. A plain text file, one URL per line, is all you need. Then submit it to Google. You can also do it in phases, so everything doesn't get spidered at once.

Anyway I know it's too late to say this now, but a 10,000-page site with a lot of inbound links will get spidered like crazy. I have a site around the same size, and Googlebot takes an average of 3,500-4,000 pages a day.

AlexK




msg:774213
 2:27 am on Aug 12, 2005 (gmt 0)

jomaxx:
...10,000-page site...Googlebot takes an average of 3,500-4,000 pages a day

(Boggle) So, you reckon that I took the correct decision, then?

jomaxx




msg:774214
 6:14 pm on Aug 12, 2005 (gmt 0)

That wasn't even counting the AdSense "Mediapartners" bot. I haven't asked them to slow down, but I do think they crawl more frequently and more intensely than is necessary.

You should be fine, as long as you make sure every page gets crawled at least once during an indexing cycle. I don't know if they really do that "deep crawl" before the update any more, but if a page doesn't get crawled at least once during the indexing cycle then my Google Belief System says that it will probably get ranked lower or not at all.

AlexK




msg:774215
 7:19 pm on Aug 12, 2005 (gmt 0)

(At the risk of moving this thread away from the original topic)

jomaxx, are you using a sitemap? If so, what is the frequency and how many new pages added to the file? In msg #:12 walkman spoke of a 1200 page site getting 4000+ visits in one day (which is as outrageous as your own case).

I would rather be in my current situation re: G than yours. (Boggle, again).

PS I did ask Google to return the Status Quo re: the M_Bot on 6 July, but with neither reply nor effect. At the time I was annoyed. Now, I am beginning to bless my lucky stars.

KiShOrE




msg:774216
 12:55 am on Aug 13, 2005 (gmt 0)

I donno whats wrong with this,

Google Visits one of the site I maintain but it seems it doesn't index any page that it visits.

Last time Google index that website is back in April. What can be the reason? Any idea?

Here is last 3 days web stats...

+++++++++++++++++++++++
RobotStats - Google Bot (http://www[.]google[.]com/)

User-Agent[View Log] Mozilla/5.0 (compatible; Googlebot/2.1; +http://www[.]google.com/bot[.]html)
Quantity6732

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www[.]google[.]com/bot.html)
6732

++++++++++++++++++++++++++++++++++++++++++++++

So, what do you say?

walkman




msg:774217
 1:08 am on Aug 13, 2005 (gmt 0)

today I got 400 visits (so far) on the same 1200 page site.

flex55




msg:774218
 11:51 am on Aug 15, 2005 (gmt 0)

Andem, KiShOrE:

I also have been crawled hard by the G Mozilla bot the last week (5K-10K pages a day) - but nothing got into the index.

I have read on WW that the Mozilla bot doesn't get pages into the index but what I really need to know is:

Did anyone see the regular G bot come AFTER the Mozilla bot? or did anyone see pages get into the index after a deep crawl of the Mozilla bot?

Andem, KiShOrE - I'd be happy to hear if you have any updates.

AlexK




msg:774219
 12:44 am on Aug 16, 2005 (gmt 0)

flex55:
Did anyone see the regular G bot come AFTER the Mozilla bot?

Yes.

My own experience + research suggests that the G_Bot needs to hit a page three times before the page gets into the index (see msg#5+7 [webmasterworld.com]). The same research suggests that the M_Bot does not count towards this total, but does suggest that the M_Bot 'scouts' a page first, then the G_Bot follows up.

Just to curdle the blood, I also saw the reverse (the M_Bot hit a page after the G_Bot, and take the page out of the index).

incrediBILL




msg:774220
 2:13 am on Aug 16, 2005 (gmt 0)

I had a similar issue but worse with Google/Yahoo/MSN all hitting my site at the same time and the AdSense mediabot joining in for fun. Heck, I even upgraded to a dual Xeon server just because of their nonsense.

No way was I going to ask them to slow it down via support as I feared what happened to the OP and I'd fall from grace with the spiders. I stuck the Crawl-delay in robots.txt and it's been much more civilized but it seems the dang spiders are always on my site now taking a page or two as they just can't get it all fast enough anymore.

flex55




msg:774221
 2:18 pm on Aug 16, 2005 (gmt 0)

Thanks for the thread ref AlexK.

I did some more reading and found someone mentioning 2 weeks until pages get into the index (I think it was Dayo_UK, can't find it now) after the Mozilla deep crawl - then I guess I'll wait (urrrrrrrr).

AlexK




msg:774222
 5:34 am on Aug 20, 2005 (gmt 0)

Still on the subject of heavy/fast crawls by the M_Bot, there have been a couple of threads recently on the same subject:
  • Heavy GoogleBot Attack? [webmasterworld.com]: 8 Aug on: 17,000 pages on a 100 visitors/day site; 3 sites, all getting hit.
    (msg8): 18 Aug: 37,364 hits in Aug so far
  • nonstop crawling [webmasterworld.com]: 16 Aug: 27,000 hits on a 1,500/month site.
    (msg8): 18 Aug: 2 sites, each thousands of requests daily from the mozilla bot (500 pages indexed).
In the past, this type of activity has been followed 2/3 weeks later by a G-update. Get ready.
This 33 message thread spans 2 pages: 33 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved