jdMorgan

msg:31823 | 12:16 am on Jan 30, 2005 (gmt 0) |
Similar to "Crawl-delay [google.com]" as defined as robots.txt extensions by Yahoo and MSN? Jim
|
sdani

msg:31824 | 12:47 am on Jan 30, 2005 (gmt 0) |
No.. I asked the same question.. and his response was "Crawl-delay" is different.. that tells them how much time to wait between two consequetive requests. i.e. "wait for 2 or 20 seconds before the next request".. that still does not tell them how aggressively they can crawl. His point was that if they know that you wouldn't mind crawling several hundred pages per second, that would also help in not being a slow crawler.
|
pendanticist

msg:31825 | 1:02 am on Jan 30, 2005 (gmt 0) |
Ohhhh, I don't think several hundred pages per second would go over very well at all. Nope, not at all.
|
jdMorgan

msg:31826 | 1:04 am on Jan 30, 2005 (gmt 0) |
Functionally, they're the same thing, all that needs to be done is to allow for fractional values, i.e.
Crawl-delay: 0.05
That's 20 requests per second. Jim
|
sdani

msg:31827 | 1:10 am on Jan 30, 2005 (gmt 0) |
Jim: That makes sense to me. I wonder why this guy did not know this. I also wonder why did I not respond at that time :( sdani
|
jdMorgan

msg:31828 | 1:30 am on Jan 30, 2005 (gmt 0) |
He may not have known because there is no standards body for robots designers. The Standard for Robot Exclusion has never been more than a proposal, although it has been widely adopted. However, as noted above, Yahoo and MSN have added extensions for Crawl-delay, and Google has added extensions for wild-card path specification, neither of which are part of the original standard. Because of their competitive status, I guess the Search Engine companies don't want to discuss these issues and come up with a new "Standard". That's regrettable. Jim
|
Nikke

msg:31829 | 10:19 pm on Feb 3, 2005 (gmt 0) |
Yesterday I allowed one of their IP numbers in again at my pet project site with about 2000 pages. This far today their bots have visited 1300 of these pages. I'm really glad they didn't do that in 13 seconds. At least if they will keep sending my 10 - 20 visitors per day...
|
winglian

msg:31830 | 10:15 pm on Feb 7, 2005 (gmt 0) |
I think if AJ hit my site any faster my server would crash! AJ has been hitting two sites over the weekend and individually logged over 30000 requests on each server! I'm not complaining, but I sure would hate to see it come any faster than it is.
|
Brett_Tabke

msg:31831 | 4:07 pm on Feb 10, 2005 (gmt 0) |
From the horses mouth: they may support crawl-delay and that would be all. Clarification will be forthcoming from same horse...
|
Kaushal Kurapati

msg:31832 | 8:23 pm on Feb 10, 2005 (gmt 0) |
We support the crawl-delay feature which allows us to fine tune the load imposed by our crawler on web site servers. This is the specification method in robots.txt (for a 10 second delay between requests): Crawl-Delay: 10 Thanks, Kaushal Search Product Manager, Ask Jeeves
|
Brett_Tabke

msg:31833 | 7:06 pm on Feb 11, 2005 (gmt 0) |
thank you!
|
|