| Sub-domain & crawl-delay
|
foxfox

msg:3760567 | 3:14 pm on Oct 7, 2008 (gmt 0) | Yahoo! Slurp is being well respected to follow the crawl-delay so it will not overload my server. However, I found the crawl-delay seems to be set per domain, so if I have a site with 50K subdomain hosting on a single server, crawl-delay = 1 is useless, seem they can fetch 50K requests per second. I belive they know as the 50K subdomain are on a single IP. I want to ask, is it possible to limit the crawl-rate by server IP, rather than domain / subdomain. How do you solve it if you have many subdomains?
|
jdMorgan

msg:3760614 | 4:39 pm on Oct 7, 2008 (gmt 0) | Since robots.txt is a 'per-(sub)domain' file, each is treated separately at one level -- The per-site URL-allow/disallow processing. But you're right, they should have a back-end 'association' process that limits the rate per IP address/hardware server. It may be that it takes some time to associate all the domains and subdomains. Has any change taken place recently, such as a new IP address, or more subdomains added to your 'collection' on the single IP address? Ultimately, the decision of how many (sub)domains to host on a server should take crawling into account. Fifty thousand is a very high number -- about 125 times higher than a 'normal' shared hosting maximum for medium- to low-traffic sites. So, you might consider setting each crawl-delay to at least 120 if you really intend to host that many sites on one server. Jim
|
|
|