Forum Moderators: open
66.196.97.158 - - [07/Aug/2009:12:12:37 +0100] "GET /MyFolder/Mypage.html HTTP/1.0" 200 64739 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
No robots.
No images.
#deny IF UA does not include Inktomi or Slurp and comes from IP range
RewriteCond %{REMOTE_ADDR} !^66\.196\.(6[4-9]¦[789][0-9]¦1[01][0-9]¦12[0-7])\.
RewriteCond %{HTTP_USER_AGENT} (inktomi¦Slurp) [NC]
RewriteRule .* - [F]
Please note; forum breaks pipe characters and requires correction.
This also takes out any other Yahoo tools that may come from this range.
It takes out all Slurp/Inktomi wich does not come from the 66.194. range, which was NOT my intention.
Believe it needs to be changed to
#deny IF UA does not include Inktomi or Slurp and comes from IP range
RewriteCond %{REMOTE_ADDR} ^66\.196\.(6[4-9]¦[789][0-9]¦1[01][0-9]¦12[0-7])\.
RewriteCond %{HTTP_USER_AGENT} !(inktomi¦Slurp) [NC]
RewriteRule .* - [F]
In one of the threads there was mention of Bing taking over the Yahoo spidering and what effects this might have on the already numeorus Yahoo bots.
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Why would slurp crawl now? Shouldn't Bing be the only bot now? Is it safe to block all yahoo bots?
They aren't anywhere close to implementation yet and I wouldn't expect Slurp to stop even during Bing supplying results to Yahoo.
You would have to be insane to let your index go stale in the event something causes the deal to break up which would leave Yahoo dead in the water.
Hopefully part of the deal is Bing will update Slurp's cache during that period, who knows.
67.195.37.171 - - [16/Sep/2009:01:11:54 +0100] "GET /robots.txt HTTP/1.0" 200 4893 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
67.195.37.171 - - [16/Sep/2009:01:11:54 +0100] "GET /MyFolder/MySubFolder/MyPage.html HTTP/1.0" 200 10357 "-" "slurp, yahoo! slurp, slurp/2.0, inktomi slurp, slurp.so/1.0"
Yuck. That looks like a hideous escapee from some Sci Fi novel's Ultra Secret Lab. And it is so-o-o going to mess with my RewriteCond codes. Hope I never see it.
P.S.
Gary saw it in July, from 72.30.161.222:
[webmasterworld.com...]
llf320056.crawl.yahoo.net
slurp, yahoo! slurp, slurp/2.0, inktomi slurp, slurp.so/1.0
robots.txt? YES
And as predicted, when it went for files got totally mired in my only-okay-from-yahoo-or-else-403 conditions. Dangit. I get minimal Yahoo traffic so am on the fence about spending X amount of time to debug and enable this new creature.