Welcome to WebmasterWorld Guest from 54.196.232.162

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Naughty Yahoo User Agents

Please post them here

     
11:12 pm on Jun 7, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


I want to appeal to all of you who have reported problems with Yahoo! user agents that don't respect robots.txt to post those user agents here.

Through a side project of mine I have a contact at Yahoo! Engineering whom I contacted yesterday. He forwarded my e-mail to someone in search ops. That person requested I send him a list of user agents that aren't respecting robots.txt.

To me this is a unique opportunity to see if Yahoo! is serious about addressing this increasingly annoying issue. And thanks to Dan I have permission to deviate from our usual format to compile this list.

Thanks in advance for your help.

6:01 pm on June 13, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts:2038
votes: 1


Gary, not to fret. You're trying to help Yahoo, and us, solve problems. That's a breath of fresh air because it's always easier to sit around and complain. And if it turns out Yahoo's folks dismiss detailed, debugging-oriented data from Web professionals? C'est la vie.

But hey, kick back and give things time. I know you're eager but they've got channels upon channels. (You were the first respondent in this, your own thread, because you didn't think we'd reply or that we weren't replying quickly enough. Heck, with mod-approval time and work skeds and such, I hadn't even seen your initial post until after you'd replied to it!)

Regardless of outcome, thank you for stepping up to the plate. Now get back to work:)

9:23 pm on June 13, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


Thanks for your support and understanding. One of these days I promise to grow up. :)
4:12 am on June 14, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


Sorry to double post. I just wanted to let you all know I heard from Warren and he told me he got the last batch of messages I sent him and he's working on the robots.txt user agent problem with Slurp China.

Bill, if you see this I've been trying to get in touch with you but your mailbox here always says it's full.

4:45 am on June 14, 2006 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14624
votes: 88


Bill, if you see this I've been trying to get in touch with you but your mailbox here always says it's full.

LOL - sorry, I dumped a bunch of sticky's the other night, will try killing more, let me know ;)

Back to Yahoo...

12:48 pm on June 16, 2006 (gmt 0)

Full Member

10+ Year Member

joined:Dec 20, 2002
posts:234
votes: 0


(Hope this belongs in this thread)

I can't figure out why we've been seeing this in our logs:

68.142.249.51 "GET /mod_ssl:error:HTTP-request HTTP/1.0" 404 316 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]

Also from 72.30.111.87 and 72.30.129.59

11:48 pm on June 29, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts:2038
votes: 1


Anyone else seeing this Slurpy sloppiness?

access_log (re the last entry, below)

wj500040.inktomisearch.com - - [29/Jun/2006:12:48:11 -0700]
"GET /SlurpConfirm404/letters/magasin/BasicTabbedPaneUI.TabSelectionHandler.htm HTTP/1.0" 404 2336 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]

error_log

[Thu Jun 29 12:41:08 2006] [error] [client 72.30.215.21] File does not exist:
/SlurpConfirm404/linkto.htm

[Thu Jun 29 12:41:38 2006] [error] [client 72.30.215.84] File does not exist:
/SlurpConfirm404/Sampler/ppv/Heartach.htm

[Thu Jun 29 12:42:44 2006] [error] [client 72.30.215.103] File does not exist:
/SlurpConfirm404.htm

[Thu Jun 29 12:43:14 2006] [error] [client 72.30.215.82] File does not exist:
/SlurpConfirm404/graph/mlm.htm

[Thu Jun 29 12:43:44 2006] [error] [client 72.30.215.103] File does not exist:
/SlurpConfirm404/exempt/PersonInfo.htm

[Thu Jun 29 12:44:14 2006] [error] [client 72.30.215.10] File does not exist:
/SlurpConfirm404/dotdon/southparkmain/holiday.htm

[Thu Jun 29 12:44:47 2006] [error] [client 72.30.215.88] File does not exist:
/SlurpConfirm404/linux/marc_d.htm

[Thu Jun 29 12:45:41 2006] [error] [client 72.30.215.18] File does not exist:
/SlurpConfirm404/mahfouad.htm

[Thu Jun 29 12:47:42 2006] [error] [client 72.30.215.80] File does not exist:
/SlurpConfirm404/livstand.htm

[Thu Jun 29 12:48:11 2006] [error] [client 72.30.215.15] File does not exist:
/SlurpConfirm404/letters/magasin/BasicTabbedPaneUI.TabSelectionHandler.htm

I thought it was a new set of exploits until I verified one of the IP as inktomi's:

IP address: 72.30.215.15
Reverse DNS: wj500040.inktomisearch.com
Reverse DNS authenticity: [Verified]

I can see doing one 404 test (well, not really, but I know some SEs do a one-file check). But 10? And from 10 IPs in under 10 minutes? Gimme a break. Besides, inktomi already asks for robots.txt about 50 times a day. So wow, why the sudden 404 assault?

4:38 am on June 30, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts:2038
votes: 1


Oh, man. At it again as I type --

[Thu Jun 29 21:19:35 2006] [error] [client 72.30.215.105] File does not exist:
/SlurpConfirm404/veronika.htm

[Thu Jun 29 21:20:47 2006] [error] [client 72.30.215.85] File does not exist:
/SlurpConfirm404/mjavary/adg.htm

[Thu Jun 29 21:21:17 2006] [error] [client 72.30.215.92] File does not exist:
/SlurpConfirm404/JenniferLopez.htm

[Thu Jun 29 21:23:30 2006] [error] [client 72.30.215.10] File does not exist:
/SlurpConfirm404/SkiNLP/MeridieShireTrollfen/infmslist.htm

[Thu Jun 29 21:24:00 2006] [error] [client 72.30.215.101] File does not exist:
/SlurpConfirm404/Constitution/ReviewQ.htm

[Thu Jun 29 21:24:30 2006] [error] [client 72.30.215.17] File does not exist:
/SlurpConfirm404/solution/somewhere/beukema.htm

[Thu Jun 29 21:25:00 2006] [error] [client 72.30.215.19] File does not exist:
/SlurpConfirm404/montages/tree.draw.Tree.htm

[Thu Jun 29 21:26:53 2006] [error] [client 72.30.215.94] File does not exist:
/SlurpConfirm404.htm

[Thu Jun 29 21:28:05 2006] [error] [client 72.30.215.108] File does not exist:
/SlurpConfirm404/ibento.htm

No one else is seeing this?

7:05 am on June 30, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


It's on one of my sites right now:

/SlurpConfirm404/Noid2K/TclCmd/komaba.htm
/SlurpConfirm404.htm
/SlurpConfirm404/stage4_options.htm
/SlurpConfirm404/table19f/john.humphries.htm

...and the list goes on and and on. None of these files has ever existed on any of my websites.

72.30.215.9
72.30.215.12
72.30.215.84
72.30.215.85
72.30.215.105
72.30.215.106

...and the list goes on and on. They all belong to Inktomi.

I'll take a chance and forward this to Warren at Inktomi when I wake up.

1:46 pm on June 30, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:June 25, 2005
posts:179
votes: 1


Slurp just checks for 404 response.

Official FAQ may help:
[help.yahoo.com ]

2:39 pm on June 30, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts:2038
votes: 1


Thanks for the link, thetrasher. People have talked about deliberate 404s but I didn't know Slurp might request up to 10 URLs at once. Usually people ask about one or maybe two oddities.

Apparently the testing is not as "rare" as stated by that page -- unless yesterday was my lucky day. Shoot. Now I find out! :)

12:19 am on July 13, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


After a long absence, "Yahoo! Slurp China;" has returned to my sites, and now seems to heed robots.txt in this format:

User-agent: Slurp China
Disallow: /

User-agent: Slurp
Crawl-delay: 3
Disallow: /cgi-bin
Disallow: /widget-scripts
Disallow: /styles-nn4.css
Disallow: /styles.css


I can't vouch for whether it will obey specific directory or file Disallows, or whether it will obey
"User-agent: *"
or any other variants.

However, it does seem to recognize that it should go away when it sees the code above, rather than accepting the
User-agent: Slurp
record and subsequently hitting my user-agent blocking code in .htaccess.

Now if I can just get "Yahoo! Slurp;" to quit listing my .css files in SERPs... Grumble, grumble... I've never seen this done by any other search engine before, but I had to add the Disallows for my .css files so that they wouldn't show up when search terms coincided with the terms I used in my .css file comments...

Jim

[edited by: jdMorgan at 12:20 am (utc) on July 13, 2006]

4:57 pm on July 13, 2006 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14624
votes: 88


Don't think I've been crawled by this strain of Slurp before, or if I was it was a long time ago but it's baaaaack:

74.6.131.201 "Mozilla/5.0 (compatible; Yahoo! DE Slurp; [help.yahoo.com...]

Why in the heck can't Yahoo just crawl pages from one place and let everyone share the pages?

I already block Yahoo China, don't make me block more...

5:23 pm on July 13, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Slurp DE is their Yahoo Directory engine, according to GaryK's earlier post (# 400194 above). I certainly don't think I'd want to block it, since I've got several 'grandfathered' free listings in their directory, and you have to pay to get in (and pay again annually to stay in) now... Blocking Slurp DE Could cost me thousands!

Jim

3:02 pm on July 14, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 15, 2004
posts:174
votes: 0


Thanks for all your comments and suggestions on this thread.

I have posted a response from Yahoo! Search on a new thread started on this forum.

Please check the information in the thread entitled Yahoo! Crawlers - A response from Yahoo! Search at
[webmasterworld.com...]

Thanks.

5:04 pm on July 14, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


Thanks for your reply Mike. Thanks also to Warren and of course Mason. Without Mason our concerns never would have made it this far because he was initially my only contact at Yahoo!
This 45 message thread spans 2 pages: 45