Forum Moderators: open

Message Too Old, No Replies

Spider Behavior

To hammer or not to hammer

         

tkarade

3:49 pm on Jan 28, 2003 (gmt 0)

10+ Year Member



After what request rate is a spieder considered to be hammering a site? 10 per second? 100 per second?

Thanks,
TK

wilderness

11:45 pm on Jan 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've never considered the amount of pages that an "ACCEPTED" bot gets pages imperative.
In fact, I'd rather have google slurp and a few others, have it all over and done in a few minutes rather than cluttering up my logs all month long.

For a bot that isn't honorable (at least from my side of the fence) a solitary page is hammering :)

pendanticist

12:46 am on Jan 29, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



10 per second?

If I recall correctly, a well-respected poster in these forums mentioned a figure of one, or possibly two requests per second, per page as acceptable. (Obeyance of robots.txt aside.)

In a post entitled: Beware the lovely Lachesis [webmasterworld.com], I was able to illustrate some attrotious behavior.

Additional reading: modified "bad-bot" script blocks site downloads [webmasterworld.com].

Also, there was a post a couple of weeks back (which I can't find at the moment) discussing a 'throttle' of sorts, which I believe would actually slow the bot down.

Maybe someone who knows where that one is will post it?

One thing is certain to me - as Webmasters we need to address poorly behaved, malconfigured or just plain stupid [webmasterworld.com] bots who peruse the Internet with impunity.

Pendanticist.

tkarade

6:00 pm on Jan 30, 2003 (gmt 0)

10+ Year Member



> If I recall correctly, a well-respected poster in these forums mentioned a figure of one, or possibly two requests per second, per page as acceptable.

That seems very slow to me. If google is indexing 3 bil pages, it takes them 3 bil seconds?

jdMorgan

5:22 am on Jan 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Pendanticist,

I think this is the post you are thinking of: Blocking badly behaved runaway WebCrawlers [webmasterworld.com]
It's a PHP script that looks pretty useful, but I need to get some time to re-write it in PERL in order to try it on my sites, unless someone else does it first.

tkarade,
No, they access one page of your site per second. While they wait to access you again, they go request another 10 million pages from other sites. So it only takes them 300 seconds to do the whole web. (I'm kidding about that 300 seconds, but you get the idea) :)

Jim