Welcome to WebmasterWorld Guest from

Forum Moderators: goodroi

Message Too Old, No Replies

Inktomi User-Agent blocked

Inktomi blocked



4:04 pm on Mar 9, 2008 (gmt 0)

10+ Year Member

I found the following IP 'denied by server' in my error logs today;
It resolves as Inktomi which I believe is Yahoo?

I cannot find any reference to Yahoo, Slurp or Inktomi in my .htaccess file User-Agent blocks (only in 'permits'), nor any reference to a 'deny from' either or the IP CIDR:

Can anyone suggest another way I might have inadvertently blocked this IP please as I clearly do not wish to block it?

Thank you.


11:49 am on Mar 12, 2008 (gmt 0)

10+ Year Member

Still need help with this if possible please.

Regarding the above, when using 'Order Deny,Allow', can I use an 'allow from' line above my 'deny from' lines in order to permit the above IP addresses and over-rule any block I must have inadvertently used? eg:

Order Deny,Allow
allow from "#Inktomi - Yahoo"

deny from 111.222.333.444
deny from 211.222.333.444
deny from 311.222.333.444

Thank you.


11:57 am on Mar 12, 2008 (gmt 0)

10+ Year Member

Also, does Inktomi crawl with the identity of 'yahoo-blogs/v3.9' or 'yahoo-mmcrawler' or by any chance?



1:59 pm on Mar 12, 2008 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

> can I use an 'allow from' line above my 'deny from' lines in order to permit the above IP addresses and over-rule any block

The order of your "Allow from" directives with respect to your "Deny from" directives in your code does not matter; They will be processed in groups as specified by the "Order" directive. That is, in your code above, all "Deny from" directives are evaluated first. Access will be denied unless an "Allow from" directive overrides the denied IP address or range. is a valid Inktomi/Yahoo IP address range, but I can't answer about the yahoo-blogs or yahoo-mmcrawler user-agents; These are Disallowed in robots.txt on my sites because there's no blog or media content I'd want indexed out-of-context on my sites. As such, all I can say is that the address range is valid for Yahoo.

Note that it's a very good idea to 'override' your access control to unconditionally allow all user-agents (even 'bad' ones) to access your custom 403 error page (if you use one) and your robots.txt file. If access to your custom 403 error page is denied, then any attempt to access your site by a Deny'ed (unwelcome) user-agent will basically put your server into a 403-Forbidden loop; The server responds to the denied attempt by trying to serve the custom 403 error page, but access to that page is also denied. So, it tries to serve the custom 403 error page, but access to that page is denied... You get the picture. (Failure to prevent this problem can be thought of as a low-impact-but-still-unpleasant denial-of-service mechanism -- provided by the Webmaster!)

If access to the robots.txt file is denied, some robots (although not the major ones) will take that as carte-blanche to spider your entire site. Although they likely won't be successful (because of your "Deny"s), they will waste a lot of bandwidth and make a mess of your log files and stats.

You can provide for these functions using mod_setenvif:

ErrorDocument 403 /403error.html
# ...(Other directives)
SetEnvIf Request_URI "/(403error\.html¦robots\.txt)$" allowit
Order Deny,Allow
# ...(Other Allows and Denys)
Allow from env=allowit

Note also that comments should be placed on separate lines as shown to prevent generation of Apache Warnings -- These warnings --even if not logged due to LogLevel settings-- will still consume/waste processing time.



2:33 pm on Mar 12, 2008 (gmt 0)

10+ Year Member

Thank you very much Jim,
I should have noted that I do in fact have a line permitting my bad-bot files, I presume it is sufficient:

RewriteRule (robots\.txt¦block\.html¦403\.shtml)$ - [L]

Having read through your post, would I be correct in deducing that I can have something like the following in order to permit the currently-blocked Inktomi's IP range?:

Options +FollowSymlinks All -Indexes
ErrorDocument 403 /403.shtml
RewriteRule (robots\.txt¦block\.html¦403\.shtml)$ - [L]
Order Deny,Allow
deny from
deny from
Order Allow,Deny
#Inktomi - Yahoo"

Thank you as always.


8:36 am on Mar 14, 2008 (gmt 0)

10+ Year Member

Help! Please,

I'm still ,regretably, blocking 'User-agent: Yahoo! Slurp' and 'Inktomi ' IP's and I have no idea how.

My error log reads:
[Fri Mar 14 06:02:42 2008] [error] [client] Directory index forbidden by Options directive: /home/mylogin/html/

The corresponding raw log entry reads:
lj511178.crawl.yahoo.net - - [14/Mar/2008:06:02:42 +0000] "GET /html/ HTTP/1.0" 403 671 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]

I also have an entry:
[Fri Mar 14 03:34:04 2008] [error] [client] File does not exist: /home/mylogin/html/file.html"
The IP of which, belongs to Yahoo.

Which leads me to assume that it is the IP that's being blocked and not any combination of the User-agent strings.

I've checked and double-checked my .htaccess and cannot find either or

I have removed all references to 'Yahoo!', 'Slurp' and 'Mozilla' from my Disallows and ensure that 'Yahoo!' and 'Slurp' are in the allow section.

Does anyone have any suggestions as to what else might be blocking them.

Thank you in advance for any advice.


Featured Threads

Hot Threads This Week

Hot Threads This Month