homepage Welcome to WebmasterWorld Guest from 54.166.255.168
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Strange Inktomi Corporation IP
SEOPTI

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4449866 posted 5:49 pm on May 5, 2012 (gmt 0)

This has been going on for a few days:

74.6.13.111 - - [05/May/2012:13:34:37 -0400] "GET /....html HTTP/1.1" 200 6297 "-" "Mozilla/5.0 (X11; Linux i686 on x86_64; rv:7.0.1) Gecko/ /7.0.1"

The IP belongs to "Inktomi Corporation, Sunnyvale". I think this is Yahoo.

It loads the whole page with all javascript every few minutes.

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4449866 posted 10:03 pm on May 5, 2012 (gmt 0)

I think this is Yahoo.

I sure hope so, because if it isn't, I've been blocking innocent humans at 74.6 without cause ;)

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4449866 posted 1:24 am on May 6, 2012 (gmt 0)

If their ability to persist is your only concern?
The following will stop them in their tracks:

RewriteCond %{HTTP_USER_AGENT} Linux
RewriteCond %{REMOTE_ADDR} ^74\.6\.
RewriteRule .* - [F]

SEOPTI

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4449866 posted 2:14 pm on May 9, 2012 (gmt 0)

Thanks for this. Today Inktomi started to bomb all my sites with thousands of page loads and I had to block them.

They even load the Adsense code with ech request. This could lead to an Adsense ban.

motorhaven

10+ Year Member



 
Msg#: 4449866 posted 6:04 pm on May 9, 2012 (gmt 0)

Thanks. Found this one in my logs as well, and now blocked.

SEOPTI

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4449866 posted 6:26 pm on May 9, 2012 (gmt 0)

IP range block was really necessary:

SetEnvIf Remote_Addr "^72\.30\." get_out
SetEnvIf Remote_Addr "^74\.6\." get_out
SetEnvIf Remote_Addr "^98\.137\.72\." get_out


<FILES *>
Order Allow,Deny
Allow from all
Deny from env=get_out
</FILES>

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4449866 posted 8:19 pm on May 9, 2012 (gmt 0)

I have the range 74.6.13.87 - 74.6.13.151 listed as a yahoo bot range. It's hit a few dozen times on the server this month but with a non-slurp IP (as noted in the OP) so it's been rejected.

I have a lot of slurp bot ranges listed at 74.6/16 but the rest is "allowed". Does anyone have a real reason to block the rest of the /16 or is part of it sometimes used by humans?

Same as above for 72.30/16 and 98.137/16, much of which is banned, but the "parent" range 98.136.0.0 - 98.139.255.255 is "allowed".

My current annoyance is 98.139.241.224 - 98.139.241.252 which is used by yahoomobile. I get loads of bad hits on those Ips, plus a few that may be valid. Anyone have anything on those?

SEOPTI

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4449866 posted 9:06 pm on May 9, 2012 (gmt 0)

This is not a crawler for me. A crawler will NOT execute client side Javascript. What would be the reason for them to do this and even load Adsense ads code thousands of times a day?

What is a Yahoo stealth crawl good for? They have Bingbot.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4449866 posted 11:16 pm on May 9, 2012 (gmt 0)

A while back, I threw in the towel and went to

BrowserMatch Yahoo keep_out

I'm now trying to figure out why I'm suddenly showing up in Yahoo image search-- with results from roboted-out directories that I keep a close watch on.

SEOPTI

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4449866 posted 9:16 pm on May 22, 2012 (gmt 0)

This bot really loves eating 403 and it never learns.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4449866 posted 1:44 pm on Nov 6, 2012 (gmt 0)

In another thread (unknown to me), I had mentioned that I had not seen the activity of Inktomi that I was used pre-2010.

Since re-activation in Feb 2012 the only Slurp requests I've seen are occasional full-page-requests (with supporting images and CSS), and from the 74.6. range.

This morning the following:
72.30.142.221 - - [06/Nov/2012:12:27:30 +0000] "GET /robots.txt HTTP/1.0" 200 2719 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http:// help.yahoo.com/help/us/ysearch/slurp)"

There were two additioanl requests (same IP and UA) for a sub-sub directory page and CSS.

There's some interesting reading on Inktomi:
Inktomi Traffic Server source
Apache Traffic Server (TS)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved