Forum Moderators: bakedjake

Message Too Old, No Replies

How much crawling is too much?

         

lemming

4:46 pm on Jun 7, 2005 (gmt 0)

10+ Year Member



Hi,

I'm about to start a niche search engine and want to start crawling just a handful of sites. All of the sites I'm targeting are pretty large with probably about 100K pages each. I was wondering how much you would consider a polite amount to crawl each day. Or do you only care about hits per second?

If I limit the bot to a 15 second delay, the most it will hit one site is about 5000 pages. Is that considered excessive for a large, popular site?

Thanks!

Dave_A

8:43 pm on Jun 7, 2005 (gmt 0)

10+ Year Member



My web spider (Linknzbot) does a get command every second but is also has a built in bandwidth detector so if the website is hosted on an ADSL line on a home computer is automatically sows this down to one get every ten seconds.
Most proffessional web host companies don't seem to mind once a second because the servers are designed to take that.
One thing you may consider is the factor of bandwidth usage which can cost website owners a heap if you index 2000 pages in one sitting, you may find it better to do three or four passes indexing a few hundred pages at a time and do this over a few days.

One good thing I have developed is a self protection code that sits inside the search engine, when people try to gain access by trying to use a password, they get two goes then the web spider sends out around a thousand feelers into their IP address and starts to suck bandwidth at about a gig a minute, not many come back a second time.