Forum Moderators: open

Message Too Old, No Replies

Google Crawling

How hard does google hit websites?

         

tkarade

5:17 pm on Jan 28, 2003 (gmt 0)

10+ Year Member



How often (per second on average) does GoogleBot access a site that it is crawling?

yetanotheruser

5:32 pm on Jan 28, 2003 (gmt 0)

10+ Year Member



She (apparantly) never hits even our bigs sites more than once every 15 seconds or so.. but.. try here :

[google.com...]

(am I allowed to post links?)

HTH, :)

jomaxx

6:40 pm on Jan 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Despite what the FAQ says, IMO you can expect to be hit UP TO 2 to 5 times per second when the bot is crawling enthusiastically. This has been my experience in the past, and I just checked and saw the same behaviour in my logs from yesterday.

Offhand it looks like some of this is due to being crawled by multiple servers simultaneously. And probably if you have a small site with a few hundred pages or less to crawl, you will see a lower peak.

2-5 per second is still a pretty light load, though. Lots of personal downloaders will hit a site far harder than that, and ignore the robots.txt exclusions as well.

uber_boy

7:19 pm on Jan 28, 2003 (gmt 0)

10+ Year Member



My impression is that Google altered its algorithm a few months ago. Whereas it used to hammer my site at a rate of 3-6k pages/hour, it now now stays longer and calls for pages at a rate of 2k pages/hour. My hunch is that it somehow adjusts its rate based on response times. That said, I've added another box since the last crawl so it will be interesting to see if the pace picks up with the upcoming crawl.

tkarade

8:55 pm on Jan 28, 2003 (gmt 0)

10+ Year Member



Thanks for the input.

jomaxx mentioned that 2-5 hits per second is a pretty light load. Any other opinions on how many hits per second are heavy or light loads?

Thanks,
TK

taxpod

9:01 pm on Jan 28, 2003 (gmt 0)

10+ Year Member



I generally get 50,000 pages pulled down in a 48 hour period spanning 3 days. That would be a page every 4 seconds. But you ask about hits? In my case 50,000 pages would be about 70,000 hits since common files are typically pulled down only once. But to reiterate what others have said, I do think the rate is adjusted based on response time. I'm also inclined to believe that depth of crawl depends on response time.

tkarade

6:04 pm on Jan 30, 2003 (gmt 0)

10+ Year Member



Does anybody thing that googleBot somehow considers bandwidth as well (or would this factor in in response time).

yetanotheruser

11:18 am on Jan 31, 2003 (gmt 0)

10+ Year Member



dont suppose there's any other way to measure bandwidth?

(there are a couple of objects in the DOM in IE (afaik) that provide a broad connection outline, (56k,isdn,lan) but nothing GBot would bother with?)