Googlebot visited 92 of my pages in 90 seconds

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot visited 92 of my pages in 90 seconds

Is that unusual? I'm not happy about it. Should I be?

surfin2u

11:50 pm on Feb 24, 2006 (gmt 0)

I know that there are many people out there, who wish that the googlebot would visit their site more often, so apologies in advance to you, if you're one of them.

I don't like having my site hit so hard by a bot. I rely on google for a substantial portion of my traffic, and so I'm not willing to take any steps that might jeopardize that. I've also heard stories about people who asked google to limit their scans, and the result was that the googlebot never returned.

Does anyone have a story to tell about how they faced my situation and took some action? What was the outcome?

Tearabite

1:43 am on Feb 25, 2006 (gmt 0)

Today i got crawled very hard, fast, deep, and hot.. (well.. not the hot part)..
I've never been crawled that hard by any bot, much-less Google.. but i'm glad!

but, even at it's fastest, it was no more than one page every 20-30 seconds...

I loved it, it was great, and I can't wait for my next one! I'd say, let Google ravage your site as much as it wants!

jomaxx

2:23 am on Feb 25, 2006 (gmt 0)

One page a second is not really very fast, as robots go. If your server can't keep up with that you've got a problem somewhere. Some spiders, especially the less respectable ones, can hit your site 10X as hard as that.

surfin2u

2:33 am on Feb 25, 2006 (gmt 0)

One page a second is not really very fast, as robots go. If your server can't keep up with that you've got a problem somewhere. Some spiders, especially the less respectable ones, can hit your site 10X as hard as that.

I agree. It was an average of one page per second. There were instances of 3 pages in 1 second. I have seen my site deliver 10 pages per second. The rate google was crawling my site could have delayed response time for others and caused them to leave. That's my main concern.

I wonder why google needs to generate bursts of high traffic, rather spreading out their requests over time. Other legitimate robots never take so many pages in such a short time span. They also don't penalize people, who ask for less frequent scans. Google doesn't even offer a robots.txt option (like crawl-delay) to slow down their scan.

And then there are the stories of sites not being returned to by the googlebot, after the webmaster makes an request by email to have the scan slowed. Anyone out there have firsthand or secondhand info on that?

Tearabite

2:38 am on Feb 25, 2006 (gmt 0)

Google says their bots obey scan_delay (or whatever the syntax is) in ROBOTS.TXT, but i havnt tried it..

BillyS

2:46 am on Feb 25, 2006 (gmt 0)

surfin2u

I have to agree with the earlier statements that an average of 1 page per second is not very fast.

>>I rely on google for a substantial portion of my traffic

Personally, I wouldn't bite the hand that feeds you.

BillyS

2:52 am on Feb 25, 2006 (gmt 0)

From Matt Cutt's blog:

Andrew Hitchcock, I asked the crawl team about this a while ago, and there�s a good reason. It turns out that a lot of webmasters give crawl-delay values that are way out of whack, in the sense that we�d only be able to crawl 15-20 urls from a site in an entire day. I�ll try to post more details about that sometime in the future. The crawl guys are interested in allowing people to give some sort of hostload hint, but it�s their opinion that crawl-delay isn�t the best way to do it.

Andrew was asking why Google did not support crawl-delay. As of February 8th, they did not.

surfin2u

2:55 am on Feb 25, 2006 (gmt 0)

Personally, I wouldn't bite the hand that feeds you.

At this point, given the small amount of info that I have, I have no plans to take any action. They are the 800-pound gorilla and I am at their mercy.

I think it's unfortunate that google lacks the courtesy to restrict their crawls to reasonable level.

surfin2u

1:51 pm on Feb 25, 2006 (gmt 0)

Yahoo's slurp bot has a crawl delay but recently began to ignore it. I got tired of them taking thousands of pages per day and only sending me a trickle of surfers, so I upped the crawl-delay to 120 (1 page every 2 minutes). It worked for a couple of weeks, but no longer.

I guess the bots' masters no longer fear being blocked, so they're doing whatever they please. From where I'm standing it seems like google, yahoo, and anyone else, who sends some traffic to websites or more importantly, might become a good source of traffic in the future, feels free to run wild.

jomaxx

4:54 pm on Feb 25, 2006 (gmt 0)

I've heard of cases where Googlebot spiders too enthusiastically, so I won't dismiss the issue altogether, but this is not one of them. As has already been explained, the level of crawling you're seeing is quite reasonable and can be handled by any server.

As for limiting requests to every 120 seconds, that's beyond ridiculous. It would be enough time to hand-write individual responses to page requests. A nice Martha-like touch, but few webmasters do it any more.

Borek

12:13 am on Feb 26, 2006 (gmt 0)

No idea whether the googlebot tries to GET next page before finishing with the previous one. But if so - it should be possible to check the referrer string - and if its a googlebot delay answer for 5 seconds. At least in PHP it will be a breeze...

tedster

1:29 am on Feb 26, 2006 (gmt 0)

Nice idea, but unfortunately search engine spiders, including googlebot, do not send referer information the way a personal browser does. Just look into any server log and you'll see this.

One reason is that search engine spiders can be a highly distributed user-agent -- that is, all the "parts" of the spidering program do not even need to reside on the same physical machine. The decision about the next url to request is not necessarily made by a simple process that is analagous to a click. Although for many crawling sequences it can look like "following a click trail", there are a lot of logic routines going on to make that decision.

It would be nice if things were so simple as looking at a referer -- but as I said, googlebot sends no referer. In fact, I think it would be impossible even to DEFINE the referer for any particular "get". And even if it were possible, on the scale of a complete web crawl the added CPU cycles and bandwidth would be extensive.

jomaxx

2:08 am on Feb 26, 2006 (gmt 0)

I think you're probably right with reference to slowing things down. At the very least Googlebot appears to be crawling from multiple data centers, having different IP addresses and sometimes making simultaneous requests.

BTW the referrer ID is not available, but the browser identifier is plainly visible and that's probably what was intended.

AlexK

9:49 am on Feb 26, 2006 (gmt 0)

surfin2u:

I wonder why google needs to generate bursts of high traffic

There is a thread entirely about this [webmasterworld.com]. In short, periods of extended deep- and fast-crawls from the G-bots have historically preceded one of the infamous G Algo-Updates (or at least, that is the hypothesis explored in the thread; the thread is mainly a call for evidence, hence dropping a link to it in this thread).

stories of sites not being returned to by the googlebot, after the webmaster makes an request by email to have the scan slowed. Anyone out there have firsthand or secondhand info on that?

Yes, first-hand experience...

The Mozilla Google-bot was burning my site at upto 30,000 hits/month [webmasterworld.com], sometimes 3/sec. G was asked last June to slow it down. They effectively switched it off (just 50/month in the Autumn). The "normal" G-bot also appeared to slow down (about 1,000/month), although the Adsense-bot compensated by going bananas (from 15,000 to 25,000/month) (you just cannot win, can you?).

I have not noticed any effect on visitors from G. Having said that, the site did get crucified in the Sept Update (and later recovered), but my site is hardly unique in that.

BillyS:

From Matt Cutt's blog:
I asked the crawl team about this a while ago, and there�s a good reason. It turns out that a lot of webmasters give crawl-delay values that are way out of whack, in the sense that we�d only be able to crawl 15-20 urls from a site in an entire day.

Hmm. So, G knows better than the webmaster how to treat any particular site. So, that's alright then.

surfin2u:

Yahoo's slurp bot has a crawl delay but recently began to ignore it

As far as I can tell, Yahoo applies the delay on a per-IP basis, and (if my site is typical) employs a multiplicity of IPs simultaneously. The combo of the above renders Crawl-delay academic.

Scruffy

10:01 am on Feb 26, 2006 (gmt 0)

It would be enough time to hand-write individual responses to page requests.

LOL!

Can you imagine the WWW in the middle ages? All those monks hunched over their lecterns, carefully scribing illuminated manuscripts for delivery by hand.

bobs12

8:26 pm on Feb 26, 2006 (gmt 0)

I'm having a situation where a relatively small and unknown site is now being crawled by Google upwards of 600 or 700 times a day (averaging 20,000 hits a month). This is double what it was before Google upped the PageRank a few days ago.

I'm not sure if this is good or not. There seem to be some weird search anomalies where my site is coming up high or first in searches where you'd never expect to find it on the first page.

Could it be that Google likes sites that let it crawl all over them?

Has anyone noticed a drop in search hits after asking Google to slow down?

It's eating up a lot of traffic!

Would Google kill you if you used, say, php to give a stripped-down version of pages according to USER_AGENT as you might for Lynx viewers?

Borek

11:42 pm on Feb 26, 2006 (gmt 0)

tedster - you are wrong, googlebots send referrer strings.

These are taken from my logs:

"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
"Googlebot/2.1 (+http://www.google.com/bot.html)"
"Googlebot-Image/1.0"
"Mediapartners-Google/2.1"
"googlebot-urlconsole"

First three are 'normal' bots, fourth one checks page content for AdSense. IIRC fifth is from the remove-url bot.

I have reasons to believe referrer strings are send always. I have a test site that is virtually not visited by normal users, it is spidered only. During several months googlebot visited site few hundred times, yet there is no single line in logs without referrer string.

Key_Master

11:53 pm on Feb 26, 2006 (gmt 0)

Borek, those aren't referring urls- they're user-agent strings.

Googlebot *can* send a referring url though. I've logged more than a few weird hits from Googlebot and some other well known search engine bots.

Borek

8:16 am on Feb 27, 2006 (gmt 0)

Yeah, I just realized I did a stupid mistake, sorry All - I was thinking about user-agent strings all the time, referrer has nothing to do with the situation.

Still - if bot sends next GET after it receives answer to previous one, this approach (of delaying answer to Googlebot) may work.

surfin2u

3:10 pm on Feb 28, 2006 (gmt 0)

As far as I can tell, Yahoo applies the delay on a per-IP basis, and (if my site is typical) employs a multiplicity of IPs simultaneously. The combo of the above renders Crawl-delay academic.

They're not even bothering doing that on my site now. They took about 3000 pages yesterday, using a delay of about 20 seconds most of the time. Yes, they did come from various IP addresses, but even requests from each individual IP address ignored my crawl-delay.

Regarding google, thanks for the info on the dangers of asking them to slow down. That's what I needed to know. I have no intention of "biting the hand that feeds me".

I'm even willing to put up with Yahoo, despite the fact that their crawler hits me 100 times more frequently than visitors that they refer to me.

AlexK

2:04 pm on Mar 1, 2006 (gmt 0)

surfin2u:

(Yahoo) even requests from each individual IP address ignored my crawl-delay

So, Slurp! now takes the same attitude as Google (msg #:7).

(google) thanks for the info on the dangers of asking them to slow down ... I have no intention of "biting the hand that feeds me"

Unfortunately, your perception of my experience is the polar-opposite of my own: "I have not noticed any effect on visitors from G". I clearly need to work more on the Comprehension aspects of my statements.

(Yahoo) despite the fact that their crawler hits me 100 times more frequently than visitors that they refer to me

That's the bit that really bugs me. I am edging closer & closer to banning them from my site.

surfin2u

5:06 pm on Mar 1, 2006 (gmt 0)

the site did get crucified in the Sept Update

need to work more on the Comprehension aspects of my statements

Sorry, I did misunderstand. I though you were making a connection between getting crucified and asking google to slow down.

Regarding Yahoo, there are a couple of other threads going about how fed up many webmasters are with them. Welcome to the club!

AlexK

10:03 am on Mar 2, 2006 (gmt 0)

surfin2u:

Sorry, I did misunderstand.

S'OK. Sometimes, I'm just a grumpy old man.

Regarding Yahoo, ... how fed up many webmasters are with them

A bit off-topic here, of course, but I was thinking yesterday of how much like drakes (male ducks) the Yahoo! bots are. (If that statement mystifies you, research duck mating habits. You will find that in Spring it is essentially mass gang-rape).

surfin2u

1:31 pm on Mar 4, 2006 (gmt 0)

thanks for the chuckle!