Forum Moderators: open

Message Too Old, No Replies

Yahoo! Verifier

Yet another Yahoo robot

         

jdMorgan

2:05 am on Aug 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A new one for me... Anyone else see this or know what it is?

66.228.165.141 - - [25/Aug/2008:21:12:32 -0400] "GET / HTTP/1.1" 403 666 "-" "Mozilla/5.0 (compatible; Yahoo! Verifier/1.1)"

Request headers (all): Accept-Encoding: "compress, gzip"

rDNS: wgdev1.yst.corp.yahoo.com

Jim

dstiles

7:24 pm on Aug 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Been getting those for a week-ish now. Yahoo's Overture (ex-GoTo.com) IP block, which sends quite a few half-baked robots with little common sense. I sometimes wonder if the block was given to apprentices to play with! :(

I'm blocking it but not sure whether it's useful or not. Nothing to do with anything I've submitted to their webmaster tools.

GaryK

2:17 am on Sep 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I saw this one in my logs this week too. It hit each of my websites. I sent an e-mail to my contact at Yahoo! to see if he'll tell me what purpose this goofy new UA serves.

GaryK

6:29 pm on Sep 5, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Apparently I've got a new contact at Yahoo!. He's the VP, Chief of Staff for Audience Product Division. Sounds like a fancy title. He told me he's trying to track down what this new UA does as he's not familiar with it himself.

GaryK

10:07 pm on Sep 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I finally got a reply. Here is the gist of the e-mail in my own words:

The bot in question is one of their search bots that they use to assess content relevance, and has been in use for the past 2-3 years.

dstiles

1:30 am on Sep 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So what is the firefox bonecho UA used for then? I thought that was content verifcation.

Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20080721 BonEcho/2.0.0.4

GaryK

2:09 am on Sep 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



All I can tell you is what was shared with me about that kind of thing. Yahoo! has absolutely no internal mechanism for keeping track of user agents. I made it clear, again like I did many months ago, that webmasters were getting tired of bots like these and were, for the most part, only allowing the main bots crawl their sites. Yahoo! is not alone in this problem so this isn't a rant against them. I received assurances, once again, that attempts would be made to improve the process. The main suggestion I made was to include a URL in the UA so we can find out what the bot is supposed to do.

jdMorgan

2:36 am on Sep 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think the BonEcho UA is one of two things: It's either a CSS checker (it always fetches my external css file if the main page refers to it) or it is a test version of the next Slurp crawler. My money is on the first one -- probably looking for display:none blocks and other CSS "tricks."

Jim

dstiles

5:04 pm on Sep 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks, both.

It's likely that any url would go to a page that gave a generic "this is our bot, we obey robots.txt" page - which I think bonecho doesn't anyway since it's purporting to be a browser (may even BE a browser!).

To be frank, Yahoo really are getting annoying, especially for such a small visitor return. Last month the top three crawlers hit across all the sites on my server as follows:

bonecho 16,145
slurp 113,136
googlebot 28,418
msnbot 10,268

So, a total from yahoo (at least!) of 129,281 compared with 38,686 for BOTH the others - that's 70% yahoo to 30% for the other two. And now they are dropping Verify on us as well (two years? Haven't seen it before or it would be in my Bots list!).

It's not as if there's 113,000 pages on the server to begin with! I doubt it's a tenth of that - they are all smallish sites.

Last month I blocked 41545 hits from undesriable IPs plus 29789 SQL Injection hits. Yahoo exceeded those baddies by a LONG way.

jdMorgan

5:40 pm on Sep 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That BonEcho from Yahoo is not a browser: It never fetches anything but pages and CSS files. Also the Accept headers aren't right for a real browser, IIRC.

Webmasters trade spider bandwidth for traffic. I agree that Yahoo goes overboard, but it's all ROI -- Each Webmaster must make the call based on how much quality traffic he/she gets from Yahoo.

Jim

dstiles

8:33 pm on Sep 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks, Jim.

Yes, the headers are wrong but it is possible to fake that with a minor firefox script. I'm very inclined to believe it is a robot, though.

Part of the worry with yahoo (and increasingly with google and msn) is: what happens if you ban their "weird" bots? Do they still give a listing based on slurp or is this modified by results from the others? And if so how?

My figures above omitted msn's referer spam in the 65.55.*.* range by the way - add another 5,000 or so hits for that, plus their msr bot and various others. Still nowhere near Yahoo's hits, though.