Welcome to WebmasterWorld Guest from 54.196.214.35

Forum Moderators: open

Message Too Old, No Replies

A parameter to tell the crawler how how many pages in a second!

     
11:53 pm on Jan 29, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 20, 2003
posts:450
votes: 0


I met an Ask Jeevs engineer today (Teoma guy).. and in informal discussion he said it would be good if there was a parameter which tells us how hard can we hit your web site.. like "you can get upto 100 pages per second.. ".

I thought that was genious idea.. Doesn't something like that already exist in RFC?

12:16 am on Jan 30, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Similar to "Crawl-delay [google.com]" as defined as robots.txt extensions by Yahoo and MSN?

Jim

12:47 am on Jan 30, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 20, 2003
posts:450
votes: 0


No.. I asked the same question.. and his response was "Crawl-delay" is different.. that tells them how much time to wait between two consequetive requests. i.e. "wait for 2 or 20 seconds before the next request".. that still does not tell them how aggressively they can crawl. His point was that if they know that you wouldn't mind crawling several hundred pages per second, that would also help in not being a slow crawler.
1:02 am on Jan 30, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 27, 2002
posts:1685
votes: 0


Ohhhh, I don't think several hundred pages per second would go over very well at all. Nope, not at all.
1:04 am on Jan 30, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Functionally, they're the same thing, all that needs to be done is to allow for fractional values, i.e.

Crawl-delay: 0.05

That's 20 requests per second.

Jim

1:10 am on Jan 30, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 20, 2003
posts:450
votes: 0


Jim:

That makes sense to me. I wonder why this guy did not know this. I also wonder why did I not respond at that time :(

sdani

1:30 am on Jan 30, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


He may not have known because there is no standards body for robots designers. The Standard for Robot Exclusion has never been more than a proposal, although it has been widely adopted. However, as noted above, Yahoo and MSN have added extensions for Crawl-delay, and Google has added extensions for wild-card path specification, neither of which are part of the original standard.

Because of their competitive status, I guess the Search Engine companies don't want to discuss these issues and come up with a new "Standard". That's regrettable.

Jim

10:19 pm on Feb 3, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:May 13, 2003
posts:442
votes: 0


Yesterday I allowed one of their IP numbers in again at my pet project site with about 2000 pages. This far today their bots have visited 1300 of these pages.

I'm really glad they didn't do that in 13 seconds. At least if they will keep sending my 10 - 20 visitors per day...

10:15 pm on Feb 7, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 25, 2003
posts:94
votes: 0


I think if AJ hit my site any faster my server would crash! AJ has been hitting two sites over the weekend and individually logged over 30000 requests on each server! I'm not complaining, but I sure would hate to see it come any faster than it is.
4:07 pm on Feb 10, 2005 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38048
votes: 12


From the horses mouth: they may support crawl-delay and that would be all.

Clarification will be forthcoming from same horse...

8:23 pm on Feb 10, 2005 (gmt 0)

New User

10+ Year Member

joined:Jan 24, 2005
posts:4
votes: 0


We support the crawl-delay feature which allows us to fine tune the load imposed by our crawler on web site servers.

This is the specification method in robots.txt (for a 10 second delay between requests):

Crawl-Delay: 10

Thanks,
Kaushal
Search Product Manager, Ask Jeeves

7:06 pm on Feb 11, 2005 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38048
votes: 12


thank you!