Page is a not externally linkable
incrediBILL - 5:49 pm on Jun 5, 2007 (gmt 0)
<RANT> Now what about Yahoo's crawlers using a common CACHE server? Why do we need to allow an army of Yahoo spiders to redundantly abuse our servers? Is it a conceptual problem that Yahoo can't share pages already downloaded? When I posed that question to one of their engineers I was given a lame excuse that the various crawlers had different needs. OK, what could one crawler need that's different when you download a page? The images? the CSS? well you certainly don't need to download the page AGAIN just to get those items and you cache anything else downloaded and share it as well, it's not rocket science. If it's the age of the cached page that's the issue, download it again, just to the CACHE server for all to share. Funny, Google managed to make some of their crawlers share CACHE, so we know it can be done. FWIW, the only thing worse than Yahoo's army of crawlers is the ton of Nutch's out there. </RANT> [edited by: incrediBILL at 5:51 pm (utc) on June 5, 2007]
This is about 18 months after the rest of the crawler world updated their DNS, but they still deserve a pat on the back for finally getting it done.