Forum Moderators: open
I complained:
One of our sites has 110 pages available for indexing ... In the first 14 days of this month the Yahoo! Slurp bot has requested 737 pages and robots.txt 449 times - roughly 32 times per day. This is really taking freshness to the extreme!
Up till today, Yahoo has crawled 230 pages and requested robots.txt 186 times (roughly 13 times per day) in August. Page-wise this is on par with MSN (222+66) and Google (237+20). Although it is still comparatively extreme with the number of times the bot requests robots.txt, it is a gigantic improvement.
I'd like to think this is a result of Yahoo listening to us webmasters and not just an aberration, but I guess only time will tell.
Yahoo_Mike, if you see this, please don't allow Slurp to return to its bad old ways! :)
The lame answer someone from Yahoo once told me was that each of the spiders looked for different things, and all were developed from different groups that were acquired.
OK, so how does that make sharing a single cache server a problem?
The main Yahoo index has every stinking page cached so why can't all the other Yahoo's just look to that cache, or request a recrawl via that crawler if they want an update?
For instance, the IMAGE search can already find all the links to images from the pages previously downloaded and cached, there is really no need to ask for my page redundantly.
Also, why should I pay for the additional bandwidth every time Yahoo unleashes something new?
It's VERY ANNOYING and almost as relentless as the bazillion Nutch crawlers out there.
Seriously Yahoo, open up your main index and let the rest of your services just share that cache and stop crawling us over and over and over.
Things like this are probably what eventually drive normally sane people to climb the campus clock tower with a deer rifle...
I'd like to think this is a result of Yahoo listening to us webmasters and not just an aberration, but I guess only time will tell.
IMO Yahoo needs to be spanked for having more than one crawler and not just sharing the data internally.
Fully agreed. However I ban all Y! crawlers except the plain vanilla Y! Slurp via robots.txt, so this doesn't affect me.
GarK wrote:
I've seen more garbage from Y! than I did before that thread.
I can't say I have, but then again I don't have any stealth-bot catching scripts running, so maybe I'm missing them.
The purpose of my message was to give them a public pat-on-the-back for reining in the rampant Slurp crawling in the hope it encourages them to continue the good work.
Y! sends me so little traffic that I'm thinking of just banning them completely.
I was considering that too up till last month. In the last three months, referrals from Y! account for a tiny 3.28% of our traffic on this particular site. Google sends 62.14%.