Welcome to WebmasterWorld Guest from 22.214.171.124
Forum Moderators: bakedjake
5 minutes indexing a site is fairly short if it has several hundred pages. The time you take to index a site shouldn't be an issue if your crawlers are running in parallel, which I assume is why you did a rewrite.
If you are respecting the new crawl-delay directive in robots.txt (which I highly recommend) you may find you have to rearchitect again. 10 seconds seems to be a common delay that webmasters would like between robot hits. So if you have a single thread that is waiting 10 seconds between each request to a single website it is crawling, that is very inefficient. Try queue up a list of urls that belong to a range of websites and alternate which site you're hitting making sure you're respecting crawl-delay for all sites. Not a trivial piece of code, but its IMHO the best way to do things. That way you can have each thread of your crawler firing off 100 requests per second and not overwhelm any site.