Forum Moderators: open
Through a side project of mine I have a contact at Yahoo! Engineering whom I contacted yesterday. He forwarded my e-mail to someone in search ops. That person requested I send him a list of user agents that aren't respecting robots.txt.
To me this is a unique opportunity to see if Yahoo! is serious about addressing this increasingly annoying issue. And thanks to Dan I have permission to deviate from our usual format to compile this list.
Thanks in advance for your help.
But hey, kick back and give things time. I know you're eager but they've got channels upon channels. (You were the first respondent in this, your own thread, because you didn't think we'd reply or that we weren't replying quickly enough. Heck, with mod-approval time and work skeds and such, I hadn't even seen your initial post until after you'd replied to it!)
Regardless of outcome, thank you for stepping up to the plate. Now get back to work:)
Bill, if you see this I've been trying to get in touch with you but your mailbox here always says it's full.
I can't figure out why we've been seeing this in our logs:
68.142.249.51 "GET /mod_ssl:error:HTTP-request HTTP/1.0" 404 316 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
Also from 72.30.111.87 and 72.30.129.59
access_log (re the last entry, below)
wj500040.inktomisearch.com - - [29/Jun/2006:12:48:11 -0700]
"GET /SlurpConfirm404/letters/magasin/BasicTabbedPaneUI.TabSelectionHandler.htm HTTP/1.0" 404 2336 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
error_log
[Thu Jun 29 12:41:08 2006] [error] [client 72.30.215.21] File does not exist:
/SlurpConfirm404/linkto.htm
[Thu Jun 29 12:41:38 2006] [error] [client 72.30.215.84] File does not exist:
/SlurpConfirm404/Sampler/ppv/Heartach.htm
[Thu Jun 29 12:42:44 2006] [error] [client 72.30.215.103] File does not exist:
/SlurpConfirm404.htm
[Thu Jun 29 12:43:14 2006] [error] [client 72.30.215.82] File does not exist:
/SlurpConfirm404/graph/mlm.htm
[Thu Jun 29 12:43:44 2006] [error] [client 72.30.215.103] File does not exist:
/SlurpConfirm404/exempt/PersonInfo.htm
[Thu Jun 29 12:44:14 2006] [error] [client 72.30.215.10] File does not exist:
/SlurpConfirm404/dotdon/southparkmain/holiday.htm
[Thu Jun 29 12:44:47 2006] [error] [client 72.30.215.88] File does not exist:
/SlurpConfirm404/linux/marc_d.htm
[Thu Jun 29 12:45:41 2006] [error] [client 72.30.215.18] File does not exist:
/SlurpConfirm404/mahfouad.htm
[Thu Jun 29 12:47:42 2006] [error] [client 72.30.215.80] File does not exist:
/SlurpConfirm404/livstand.htm
[Thu Jun 29 12:48:11 2006] [error] [client 72.30.215.15] File does not exist:
/SlurpConfirm404/letters/magasin/BasicTabbedPaneUI.TabSelectionHandler.htm
I thought it was a new set of exploits until I verified one of the IP as inktomi's:
IP address: 72.30.215.15
Reverse DNS: wj500040.inktomisearch.com
Reverse DNS authenticity: [Verified]
I can see doing one 404 test (well, not really, but I know some SEs do a one-file check). But 10? And from 10 IPs in under 10 minutes? Gimme a break. Besides, inktomi already asks for robots.txt about 50 times a day. So wow, why the sudden 404 assault?
[Thu Jun 29 21:19:35 2006] [error] [client 72.30.215.105] File does not exist:
/SlurpConfirm404/veronika.htm
[Thu Jun 29 21:20:47 2006] [error] [client 72.30.215.85] File does not exist:
/SlurpConfirm404/mjavary/adg.htm
[Thu Jun 29 21:21:17 2006] [error] [client 72.30.215.92] File does not exist:
/SlurpConfirm404/JenniferLopez.htm
[Thu Jun 29 21:23:30 2006] [error] [client 72.30.215.10] File does not exist:
/SlurpConfirm404/SkiNLP/MeridieShireTrollfen/infmslist.htm
[Thu Jun 29 21:24:00 2006] [error] [client 72.30.215.101] File does not exist:
/SlurpConfirm404/Constitution/ReviewQ.htm
[Thu Jun 29 21:24:30 2006] [error] [client 72.30.215.17] File does not exist:
/SlurpConfirm404/solution/somewhere/beukema.htm
[Thu Jun 29 21:25:00 2006] [error] [client 72.30.215.19] File does not exist:
/SlurpConfirm404/montages/tree.draw.Tree.htm
[Thu Jun 29 21:26:53 2006] [error] [client 72.30.215.94] File does not exist:
/SlurpConfirm404.htm
[Thu Jun 29 21:28:05 2006] [error] [client 72.30.215.108] File does not exist:
/SlurpConfirm404/ibento.htm
No one else is seeing this?
/SlurpConfirm404/Noid2K/TclCmd/komaba.htm
/SlurpConfirm404.htm
/SlurpConfirm404/stage4_options.htm
/SlurpConfirm404/table19f/john.humphries.htm
...and the list goes on and and on. None of these files has ever existed on any of my websites.
72.30.215.9
72.30.215.12
72.30.215.84
72.30.215.85
72.30.215.105
72.30.215.106
...and the list goes on and on. They all belong to Inktomi.
I'll take a chance and forward this to Warren at Inktomi when I wake up.
Official FAQ may help:
[help.yahoo.com ]
Apparently the testing is not as "rare" as stated by that page -- unless yesterday was my lucky day. Shoot. Now I find out! :)
User-agent: Slurp China
Disallow: /User-agent: Slurp
Crawl-delay: 3
Disallow: /cgi-bin
Disallow: /widget-scripts
Disallow: /styles-nn4.css
Disallow: /styles.css
However, it does seem to recognize that it should go away when it sees the code above, rather than accepting the
User-agent: Slurp
record and subsequently hitting my user-agent blocking code in .htaccess.
Now if I can just get "Yahoo! Slurp;" to quit listing my .css files in SERPs... Grumble, grumble... I've never seen this done by any other search engine before, but I had to add the Disallows for my .css files so that they wouldn't show up when search terms coincided with the terms I used in my .css file comments...
Jim
[edited by: jdMorgan at 12:20 am (utc) on July 13, 2006]
74.6.131.201 "Mozilla/5.0 (compatible; Yahoo! DE Slurp; [help.yahoo.com...]
Why in the heck can't Yahoo just crawl pages from one place and let everyone share the pages?
I already block Yahoo China, don't make me block more...
Jim
I have posted a response from Yahoo! Search on a new thread started on this forum.
Please check the information in the thread entitled Yahoo! Crawlers - A response from Yahoo! Search at
[webmasterworld.com...]
Thanks.