Forum Moderators: open

Message Too Old, No Replies

Yahoo

         

wilderness

2:36 pm on Aug 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In one of the threads there was mention of Bing taking over the Yahoo spidering and what effects this might have on the already numeorus Yahoo bots.
Although the following doesn't address Bing, they'll (Bing) will sure be jumping into a "can of worms".

66.196.97.158 - - [07/Aug/2009:12:12:37 +0100] "GET /MyFolder/Mypage.html HTTP/1.0" 200 64739 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

No robots.
No images.

#deny IF UA does not include Inktomi or Slurp and comes from IP range
RewriteCond %{REMOTE_ADDR} !^66\.196\.(6[4-9]¦[789][0-9]¦1[01][0-9]¦12[0-7])\.
RewriteCond %{HTTP_USER_AGENT} (inktomi¦Slurp) [NC]
RewriteRule .* - [F]

Please note; forum breaks pipe characters and requires correction.

This also takes out any other Yahoo tools that may come from this range.

wilderness

3:20 pm on Aug 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This will NOT fly.

It takes out all Slurp/Inktomi wich does not come from the 66.194. range, which was NOT my intention.

Believe it needs to be changed to

#deny IF UA does not include Inktomi or Slurp and comes from IP range
RewriteCond %{REMOTE_ADDR} ^66\.196\.(6[4-9]¦[789][0-9]¦1[01][0-9]¦12[0-7])\.
RewriteCond %{HTTP_USER_AGENT} !(inktomi¦Slurp) [NC]
RewriteRule .* - [F]

GaryK

3:44 am on Aug 9, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In one of the threads there was mention of Bing taking over the Yahoo spidering and what effects this might have on the already numeorus Yahoo bots.

That was my thread it got one reply.

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

I can't find it now, but I posted this UA awhile back as being from Yahoo. At least it's consistent in not reading robots.txt.

dstiles

8:47 pm on Aug 9, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



With certain header combinations that UA is a significant attack bot, at least on my server. With other combinations I merely monitor it.

wilderness

1:50 pm on Sep 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For a couple of weeks now, I've been seeing blank refers and UA's (single requests) from various Yahoo IP ranges (non-standard bot ranges).

A few days ago, Slurp visited (and repeated since then) consecutive crawls of my entire sites.

Not sure if the two are connected or not.

dstiles

9:06 pm on Sep 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why would slurp crawl now? Shouldn't Bing be the only bot now? Is it safe to block all yahoo bots? Please say yes! :)

wilderness

9:12 pm on Sep 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Wish I had a clue?

keyplyr

11:32 pm on Sep 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm still seeing the usual suspects from Yahoo.

wilderness

1:14 am on Sep 12, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



209.131.38.42 - - [04/Sep/2009:23:33:49 +0100] "GET / HTTP/1.1" 403 1159 "-" "-"
69.147.115.61 - - [05/Sep/2009:16:32:23 +0100] "GET / HTTP/1.1" 403 1159 "-" "-"
68.180.211.142 - - [07/Sep/2009:03:33:13 +0100] "GET / HTTP/1.1" 403 1159 "-" "-"
68.142.243.85 - - [08/Sep/2009:16:38:54 +0100] "GET / HTTP/1.1" 403 1159 "-" "-"
68.142.243.85 - - [12/Sep/2009:01:58:16 +0100] "GET / HTTP/1.1" 403 1159 "-" "-"

wilderness

2:43 am on Sep 12, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My apologies for not providing an explanation.

All the above were solitary requests.

incrediBILL

1:22 am on Sep 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why would slurp crawl now? Shouldn't Bing be the only bot now? Is it safe to block all yahoo bots?

They aren't anywhere close to implementation yet and I wouldn't expect Slurp to stop even during Bing supplying results to Yahoo.

You would have to be insane to let your index go stale in the event something causes the deal to break up which would leave Yahoo dead in the water.

Hopefully part of the deal is Bing will update Slurp's cache during that period, who knows.

wilderness

2:30 am on Sep 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Change of UA (and in progress of crawl)

67.195.37.171 - - [16/Sep/2009:01:11:54 +0100] "GET /robots.txt HTTP/1.0" 200 4893 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
67.195.37.171 - - [16/Sep/2009:01:11:54 +0100] "GET /MyFolder/MySubFolder/MyPage.html HTTP/1.0" 200 10357 "-" "slurp, yahoo! slurp, slurp/2.0, inktomi slurp, slurp.so/1.0"

Pfui

11:18 am on Sep 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"slurp, yahoo! slurp, slurp/2.0, inktomi slurp, slurp.so/1.0"

Yuck. That looks like a hideous escapee from some Sci Fi novel's Ultra Secret Lab. And it is so-o-o going to mess with my RewriteCond codes. Hope I never see it.

P.S.
Gary saw it in July, from 72.30.161.222:

[webmasterworld.com...]

Pfui

1:34 pm on Sep 17, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This just in --

llf320056.crawl.yahoo.net
slurp, yahoo! slurp, slurp/2.0, inktomi slurp, slurp.so/1.0

robots.txt? YES

And as predicted, when it went for files got totally mired in my only-okay-from-yahoo-or-else-403 conditions. Dangit. I get minimal Yahoo traffic so am on the fence about spending X amount of time to debug and enable this new creature.

wilderness

1:41 pm on Sep 17, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've denied it (in spite of the intensive crawl on my sites yesterday), HOWEVER, after more than ten years, my sites and are winding down to their closing.

Yahoo has always been good to my sites and crawled pages appear there faster than with other SE's.

wilderness

7:11 pm on Sep 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



slurp, yahoo! slurp, slurp/2.0, inktomi slurp, slurp.so/1.0

This thing hammered my sites for a few days, ate 403's and disappeared.

The other Slurp bots continue to crawl.