homepage Welcome to WebmasterWorld Guest from 107.22.45.61
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor
Home / Forums Index / Search Engines / Ask - Teoma
Forum Library, Charter, Moderator: open

Ask - Teoma Forum

    
A parameter to tell the crawler how how many pages in a second!
sdani




msg:31822
 11:53 pm on Jan 29, 2005 (gmt 0)

I met an Ask Jeevs engineer today (Teoma guy).. and in informal discussion he said it would be good if there was a parameter which tells us how hard can we hit your web site.. like "you can get upto 100 pages per second.. ".

I thought that was genious idea.. Doesn't something like that already exist in RFC?

 

jdMorgan




msg:31823
 12:16 am on Jan 30, 2005 (gmt 0)

Similar to "Crawl-delay [google.com]" as defined as robots.txt extensions by Yahoo and MSN?

Jim

sdani




msg:31824
 12:47 am on Jan 30, 2005 (gmt 0)

No.. I asked the same question.. and his response was "Crawl-delay" is different.. that tells them how much time to wait between two consequetive requests. i.e. "wait for 2 or 20 seconds before the next request".. that still does not tell them how aggressively they can crawl. His point was that if they know that you wouldn't mind crawling several hundred pages per second, that would also help in not being a slow crawler.

pendanticist




msg:31825
 1:02 am on Jan 30, 2005 (gmt 0)

Ohhhh, I don't think several hundred pages per second would go over very well at all. Nope, not at all.

jdMorgan




msg:31826
 1:04 am on Jan 30, 2005 (gmt 0)

Functionally, they're the same thing, all that needs to be done is to allow for fractional values, i.e.

Crawl-delay: 0.05

That's 20 requests per second.

Jim

sdani




msg:31827
 1:10 am on Jan 30, 2005 (gmt 0)

Jim:

That makes sense to me. I wonder why this guy did not know this. I also wonder why did I not respond at that time :(

sdani

jdMorgan




msg:31828
 1:30 am on Jan 30, 2005 (gmt 0)

He may not have known because there is no standards body for robots designers. The Standard for Robot Exclusion has never been more than a proposal, although it has been widely adopted. However, as noted above, Yahoo and MSN have added extensions for Crawl-delay, and Google has added extensions for wild-card path specification, neither of which are part of the original standard.

Because of their competitive status, I guess the Search Engine companies don't want to discuss these issues and come up with a new "Standard". That's regrettable.

Jim

Nikke




msg:31829
 10:19 pm on Feb 3, 2005 (gmt 0)

Yesterday I allowed one of their IP numbers in again at my pet project site with about 2000 pages. This far today their bots have visited 1300 of these pages.

I'm really glad they didn't do that in 13 seconds. At least if they will keep sending my 10 - 20 visitors per day...

winglian




msg:31830
 10:15 pm on Feb 7, 2005 (gmt 0)

I think if AJ hit my site any faster my server would crash! AJ has been hitting two sites over the weekend and individually logged over 30000 requests on each server! I'm not complaining, but I sure would hate to see it come any faster than it is.

Brett_Tabke




msg:31831
 4:07 pm on Feb 10, 2005 (gmt 0)

From the horses mouth: they may support crawl-delay and that would be all.

Clarification will be forthcoming from same horse...

Kaushal Kurapati




msg:31832
 8:23 pm on Feb 10, 2005 (gmt 0)

We support the crawl-delay feature which allows us to fine tune the load imposed by our crawler on web site servers.

This is the specification method in robots.txt (for a 10 second delay between requests):

Crawl-Delay: 10

Thanks,
Kaushal
Search Product Manager, Ask Jeeves

Brett_Tabke




msg:31833
 7:06 pm on Feb 11, 2005 (gmt 0)

thank you!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Ask - Teoma
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved