fwiw: I can't be bothered to keep track, so I just block 'em by name:
BrowserMatch Yahoo keep_out
pretty well covers it ;)
This probably dates back to last January when I looked closely at robotic behavior and ended up tossing several into the "because I don't like your face" bin.
Bing and Yahoo are still my main constant and reliable best converting traffic sources (Bing/Yahoo 93%, Goog 5%).
Blocking them would not be wise therefore I unblocked some of their IPs. I don't understand why Slurp is still alive, they have Bingbot which is really easy to handle with the crawl control in their WMT. It seems Bingbot only fetches html files and Slurp the rest.
But at least Goog should block Slurp from Analytics (no, this is not direct traffic LOL) and Adsense, it should be really easy for them.
They fetch complete packages. In my case this means that every request is followed by a fetch of errorstyles.css. Every single time. Which in turn means they actually go to the 403 page, rather than swallowing the numbers and moving on. Very un-robotlike. Rarely they also get the favicon.
Random hopping through raw logs brought up only one request for robots.txt, and that was in October. Of 2011. (I later found a day in March where all they asked for was robots.txt. Two redirects, four successful pickups. I guess they freeze them for later.)
Along the way I was staggered to discover that they've been steadily asking for the same handful of pages over and over again, several times a day. Maybe they lost their shopping list and these are the only titles they can remember.
I have also now remembered that it was YahooCacheSystem that originally offended me. Slurp is just along for the ride. Makes no difference to searches, since they don't do their own crawling.
|I can't be bothered to keep track, so I just block 'em |
I stopped bothering about bots long time ago too.
There's just so much humanly possible to keep a track of in this increasingly 'social' web climate, especially with huge traffic turnover sites.
And Slurp is one of the well-behaved ones. There are JS-executing bad bots who don't actually announce themselves openly.
Not to mention armies of human scrapers employed specifically to spam by... ahem... "marketing" outfits. Try beating them.
The more you investigate, the deeper the rabbit hole goes and you end up doing 'just' that and not web development.
So in the end, I have cautious confidence in AdSense to be better equipped to filter them out, or else it's not worth the hype it deserves.
Yes, AdSense impressions by direct visits as Yahoo Slurp from Bethesda, Maryland with a 100% bounce rate - it's ridiculous and getting old...
Operating System: (Not Set)
Screen Colors: 16 bit
Bounce Rate: 100%
Location: Bethesda, Maryland (according GA)
Location: California (according to IP address designated as Yahoo)
Exactly is is Bethesda, Maryland.
I had to: SetEnvIf User-Agent Slurp get_out
Even the Slurp help page which you see in your logs is not available any more. It is just completely pointless.
When I first checked adsense analytics based on screen size I thought, what's wrong with 1024x768, they almost never click, they always bounce, it was a mystery, now I know it's Yahoo.