Page is a not externally linkable
robho - 1:51 pm on Oct 20, 2004 (gmt 0)
Every now and then it triggers my mod_throttle rules (too many pages accessed by one IP in a given time frame). I then return a 503 error code. But it just carries on pounding away, ignoring that for hundreds more pages... Of course, none of the tens of thousands of pages it grabs per day are indexed yet. It just has one page from the site in the SERPS, a holding page from many months ago. A key to the search engine wars is comprehesive, fresh results. They seem to be trying a bit hard for the first part of that, and failing totally on the second part.
The AJ/Teoma crawler is also hitting one of my sites quite hard, a page every 2 or 3 seconds, 24/7 (there are a few hundred thousand pages on the site).