homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Sub-domain & crawl-delay

5+ Year Member

Msg#: 3760565 posted 3:14 pm on Oct 7, 2008 (gmt 0)

Yahoo! Slurp is being well respected to follow the crawl-delay so it will not overload my server.

However, I found the crawl-delay seems to be set per domain, so if I have a site with 50K subdomain hosting on a single server, crawl-delay = 1 is useless, seem they can fetch 50K requests per second.

I belive they know as the 50K subdomain are on a single IP.

I want to ask, is it possible to limit the crawl-rate by server IP, rather than domain / subdomain.

How do you solve it if you have many subdomains?



WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member

Msg#: 3760565 posted 4:39 pm on Oct 7, 2008 (gmt 0)

Since robots.txt is a 'per-(sub)domain' file, each is treated separately at one level -- The per-site URL-allow/disallow processing. But you're right, they should have a back-end 'association' process that limits the rate per IP address/hardware server.

It may be that it takes some time to associate all the domains and subdomains. Has any change taken place recently, such as a new IP address, or more subdomains added to your 'collection' on the single IP address?

Ultimately, the decision of how many (sub)domains to host on a server should take crawling into account. Fifty thousand is a very high number -- about 125 times higher than a 'normal' shared hosting maximum for medium- to low-traffic sites. So, you might consider setting each crawl-delay to at least 120 if you really intend to host that many sites on one server.


Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved