homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Ask - Teoma
Forum Library, Charter, Moderator: open

Ask - Teoma Forum

A parameter to tell the crawler how how many pages in a second!

 11:53 pm on Jan 29, 2005 (gmt 0)

I met an Ask Jeevs engineer today (Teoma guy).. and in informal discussion he said it would be good if there was a parameter which tells us how hard can we hit your web site.. like "you can get upto 100 pages per second.. ".

I thought that was genious idea.. Doesn't something like that already exist in RFC?



 12:16 am on Jan 30, 2005 (gmt 0)

Similar to "Crawl-delay [google.com]" as defined as robots.txt extensions by Yahoo and MSN?



 12:47 am on Jan 30, 2005 (gmt 0)

No.. I asked the same question.. and his response was "Crawl-delay" is different.. that tells them how much time to wait between two consequetive requests. i.e. "wait for 2 or 20 seconds before the next request".. that still does not tell them how aggressively they can crawl. His point was that if they know that you wouldn't mind crawling several hundred pages per second, that would also help in not being a slow crawler.


 1:02 am on Jan 30, 2005 (gmt 0)

Ohhhh, I don't think several hundred pages per second would go over very well at all. Nope, not at all.


 1:04 am on Jan 30, 2005 (gmt 0)

Functionally, they're the same thing, all that needs to be done is to allow for fractional values, i.e.

Crawl-delay: 0.05

That's 20 requests per second.



 1:10 am on Jan 30, 2005 (gmt 0)


That makes sense to me. I wonder why this guy did not know this. I also wonder why did I not respond at that time :(



 1:30 am on Jan 30, 2005 (gmt 0)

He may not have known because there is no standards body for robots designers. The Standard for Robot Exclusion has never been more than a proposal, although it has been widely adopted. However, as noted above, Yahoo and MSN have added extensions for Crawl-delay, and Google has added extensions for wild-card path specification, neither of which are part of the original standard.

Because of their competitive status, I guess the Search Engine companies don't want to discuss these issues and come up with a new "Standard". That's regrettable.



 10:19 pm on Feb 3, 2005 (gmt 0)

Yesterday I allowed one of their IP numbers in again at my pet project site with about 2000 pages. This far today their bots have visited 1300 of these pages.

I'm really glad they didn't do that in 13 seconds. At least if they will keep sending my 10 - 20 visitors per day...


 10:15 pm on Feb 7, 2005 (gmt 0)

I think if AJ hit my site any faster my server would crash! AJ has been hitting two sites over the weekend and individually logged over 30000 requests on each server! I'm not complaining, but I sure would hate to see it come any faster than it is.


 4:07 pm on Feb 10, 2005 (gmt 0)

From the horses mouth: they may support crawl-delay and that would be all.

Clarification will be forthcoming from same horse...

Kaushal Kurapati

 8:23 pm on Feb 10, 2005 (gmt 0)

We support the crawl-delay feature which allows us to fine tune the load imposed by our crawler on web site servers.

This is the specification method in robots.txt (for a 10 second delay between requests):

Crawl-Delay: 10

Search Product Manager, Ask Jeeves


 7:06 pm on Feb 11, 2005 (gmt 0)

thank you!

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Ask - Teoma
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved