Welcome to WebmasterWorld Guest from 54.162.167.40

Forum Moderators: goodroi

Message Too Old, No Replies

Sub-domain & crawl-delay

     
3:14 pm on Oct 7, 2008 (gmt 0)

5+ Year Member



Yahoo! Slurp is being well respected to follow the crawl-delay so it will not overload my server.

However, I found the crawl-delay seems to be set per domain, so if I have a site with 50K subdomain hosting on a single server, crawl-delay = 1 is useless, seem they can fetch 50K requests per second.

I belive they know as the 50K subdomain are on a single IP.

I want to ask, is it possible to limit the crawl-rate by server IP, rather than domain / subdomain.

How do you solve it if you have many subdomains?

4:39 pm on Oct 7, 2008 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Since robots.txt is a 'per-(sub)domain' file, each is treated separately at one level -- The per-site URL-allow/disallow processing. But you're right, they should have a back-end 'association' process that limits the rate per IP address/hardware server.

It may be that it takes some time to associate all the domains and subdomains. Has any change taken place recently, such as a new IP address, or more subdomains added to your 'collection' on the single IP address?

Ultimately, the decision of how many (sub)domains to host on a server should take crawling into account. Fifty thousand is a very high number -- about 125 times higher than a 'normal' shared hosting maximum for medium- to low-traffic sites. So, you might consider setting each crawl-delay to at least 120 if you really intend to host that many sites on one server.

Jim

 

Featured Threads

Hot Threads This Week

Hot Threads This Month