Welcome to WebmasterWorld Guest from 54.197.116.116

Forum Moderators: open

Message Too Old, No Replies

A parameter to tell the crawler how how many pages in a second!

   
11:53 pm on Jan 29, 2005 (gmt 0)

10+ Year Member



I met an Ask Jeevs engineer today (Teoma guy).. and in informal discussion he said it would be good if there was a parameter which tells us how hard can we hit your web site.. like "you can get upto 100 pages per second.. ".

I thought that was genious idea.. Doesn't something like that already exist in RFC?

12:16 am on Jan 30, 2005 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Similar to "Crawl-delay [google.com]" as defined as robots.txt extensions by Yahoo and MSN?

Jim

12:47 am on Jan 30, 2005 (gmt 0)

10+ Year Member



No.. I asked the same question.. and his response was "Crawl-delay" is different.. that tells them how much time to wait between two consequetive requests. i.e. "wait for 2 or 20 seconds before the next request".. that still does not tell them how aggressively they can crawl. His point was that if they know that you wouldn't mind crawling several hundred pages per second, that would also help in not being a slow crawler.
1:02 am on Jan 30, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ohhhh, I don't think several hundred pages per second would go over very well at all. Nope, not at all.
1:04 am on Jan 30, 2005 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Functionally, they're the same thing, all that needs to be done is to allow for fractional values, i.e.

Crawl-delay: 0.05

That's 20 requests per second.

Jim

1:10 am on Jan 30, 2005 (gmt 0)

10+ Year Member



Jim:

That makes sense to me. I wonder why this guy did not know this. I also wonder why did I not respond at that time :(

sdani

1:30 am on Jan 30, 2005 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



He may not have known because there is no standards body for robots designers. The Standard for Robot Exclusion has never been more than a proposal, although it has been widely adopted. However, as noted above, Yahoo and MSN have added extensions for Crawl-delay, and Google has added extensions for wild-card path specification, neither of which are part of the original standard.

Because of their competitive status, I guess the Search Engine companies don't want to discuss these issues and come up with a new "Standard". That's regrettable.

Jim

10:19 pm on Feb 3, 2005 (gmt 0)

10+ Year Member



Yesterday I allowed one of their IP numbers in again at my pet project site with about 2000 pages. This far today their bots have visited 1300 of these pages.

I'm really glad they didn't do that in 13 seconds. At least if they will keep sending my 10 - 20 visitors per day...

10:15 pm on Feb 7, 2005 (gmt 0)

10+ Year Member



I think if AJ hit my site any faster my server would crash! AJ has been hitting two sites over the weekend and individually logged over 30000 requests on each server! I'm not complaining, but I sure would hate to see it come any faster than it is.
4:07 pm on Feb 10, 2005 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



From the horses mouth: they may support crawl-delay and that would be all.

Clarification will be forthcoming from same horse...

8:23 pm on Feb 10, 2005 (gmt 0)

10+ Year Member



We support the crawl-delay feature which allows us to fine tune the load imposed by our crawler on web site servers.

This is the specification method in robots.txt (for a 10 second delay between requests):

Crawl-Delay: 10

Thanks,
Kaushal
Search Product Manager, Ask Jeeves

7:06 pm on Feb 11, 2005 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



thank you!