| Inktomi User-Agent blocked Inktomi blocked |
cyberdyne

msg:3595654 | 4:04 pm on Mar 9, 2008 (gmt 0) | Hi, I found the following IP 'denied by server' in my error logs today; 74.6.8.102 It resolves as Inktomi which I believe is Yahoo?I cannot find any reference to Yahoo, Slurp or Inktomi in my .htaccess file User-Agent blocks (only in 'permits'), nor any reference to a 'deny from' either 74.6.8.102 or the IP CIDR: 74.6.0.0/16 Can anyone suggest another way I might have inadvertently blocked this IP please as I clearly do not wish to block it? Thank you.
|
cyberdyne

msg:3598338 | 11:49 am on Mar 12, 2008 (gmt 0) | Still need help with this if possible please. Regarding the above, when using 'Order Deny,Allow', can I use an 'allow from' line above my 'deny from' lines in order to permit the above IP addresses and over-rule any block I must have inadvertently used? eg: Order Deny,Allow allow from 74.6.0.0/16 "#Inktomi - Yahoo" deny from 111.222.333.444 deny from 211.222.333.444 deny from 311.222.333.444 Thank you.
|
cyberdyne

msg:3598345 | 11:57 am on Mar 12, 2008 (gmt 0) | Also, does Inktomi crawl with the identity of 'yahoo-blogs/v3.9' or 'yahoo-mmcrawler' or by any chance? Thanks
|
jdMorgan

msg:3598467 | 1:59 pm on Mar 12, 2008 (gmt 0) | > can I use an 'allow from' line above my 'deny from' lines in order to permit the above IP addresses and over-rule any block The order of your "Allow from" directives with respect to your "Deny from" directives in your code does not matter; They will be processed in groups as specified by the "Order" directive. That is, in your code above, all "Deny from" directives are evaluated first. Access will be denied unless an "Allow from" directive overrides the denied IP address or range. 74.6.0.0/16 is a valid Inktomi/Yahoo IP address range, but I can't answer about the yahoo-blogs or yahoo-mmcrawler user-agents; These are Disallowed in robots.txt on my sites because there's no blog or media content I'd want indexed out-of-context on my sites. As such, all I can say is that the address range is valid for Yahoo. Note that it's a very good idea to 'override' your access control to unconditionally allow all user-agents (even 'bad' ones) to access your custom 403 error page (if you use one) and your robots.txt file. If access to your custom 403 error page is denied, then any attempt to access your site by a Deny'ed (unwelcome) user-agent will basically put your server into a 403-Forbidden loop; The server responds to the denied attempt by trying to serve the custom 403 error page, but access to that page is also denied. So, it tries to serve the custom 403 error page, but access to that page is denied... You get the picture. (Failure to prevent this problem can be thought of as a low-impact-but-still-unpleasant denial-of-service mechanism -- provided by the Webmaster!) If access to the robots.txt file is denied, some robots (although not the major ones) will take that as carte-blanche to spider your entire site. Although they likely won't be successful (because of your "Deny"s), they will waste a lot of bandwidth and make a mess of your log files and stats. You can provide for these functions using mod_setenvif:
ErrorDocument 403 /403error.html # # ...(Other directives) # SetEnvIf Request_URI "/(403error\.html¦robots\.txt)$" allowit # Order Deny,Allow # # ...(Other Allows and Denys) # Allow from env=allowit
Note also that comments should be placed on separate lines as shown to prevent generation of Apache Warnings -- These warnings --even if not logged due to LogLevel settings-- will still consume/waste processing time. Jim
|
cyberdyne

msg:3598495 | 2:33 pm on Mar 12, 2008 (gmt 0) | Thank you very much Jim, I should have noted that I do in fact have a line permitting my bad-bot files, I presume it is sufficient: RewriteRule (robots\.txt¦block\.html¦403\.shtml)$ - [L] Having read through your post, would I be correct in deducing that I can have something like the following in order to permit the currently-blocked Inktomi's IP range?: Options +FollowSymlinks All -Indexes ErrorDocument 403 /403.shtml # RewriteRule (robots\.txt¦block\.html¦403\.shtml)$ - [L] # Order Deny,Allow deny from 111.222.255.255 deny from 211.222.255.255 # Order Allow,Deny #Inktomi - Yahoo" 74.6.0.0/16 Thank you as always.
|
cyberdyne

msg:3600359 | 8:36 am on Mar 14, 2008 (gmt 0) | Help! Please, I'm still ,regretably, blocking 'User-agent: Yahoo! Slurp' and 'Inktomi ' IP's and I have no idea how. My error log reads: [Fri Mar 14 06:02:42 2008] [error] [client 74.6.28.28] Directory index forbidden by Options directive: /home/mylogin/html/ The corresponding raw log entry reads: lj511178.crawl.yahoo.net - - [14/Mar/2008:06:02:42 +0000] "GET /html/ HTTP/1.0" 403 671 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...] BUT... I also have an entry: [Fri Mar 14 03:34:04 2008] [error] [client 209.191.123.33] File does not exist: /home/mylogin/html/file.html" The IP of which, belongs to Yahoo. Which leads me to assume that it is the IP that's being blocked and not any combination of the User-agent strings. I've checked and double-checked my .htaccess and cannot find either 74.6.28.28 or 74.6.0.0/16 I have removed all references to 'Yahoo!', 'Slurp' and 'Mozilla' from my Disallows and ensure that 'Yahoo!' and 'Slurp' are in the allow section. Does anyone have any suggestions as to what else might be blocking them. Thank you in advance for any advice.
|
|
|