Forum Moderators: bakedjake
I'm about to start a niche search engine and want to start crawling just a handful of sites. All of the sites I'm targeting are pretty large with probably about 100K pages each. I was wondering how much you would consider a polite amount to crawl each day. Or do you only care about hits per second?
If I limit the bot to a 15 second delay, the most it will hit one site is about 5000 pages. Is that considered excessive for a large, popular site?
Thanks!
One good thing I have developed is a self protection code that sits inside the search engine, when people try to gain access by trying to use a password, they get two goes then the web spider sends out around a thousand feelers into their IP address and starts to suck bandwidth at about a gig a minute, not many come back a second time.