homepage Welcome to WebmasterWorld Guest from 23.20.77.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
blocking anonymous user agents
but not from robots.txt
dcrombie




msg:1496063
 11:56 am on Jan 15, 2004 (gmt 0)

I've been breaking my brain on this for a week now. I've got the following in a .htaccess file to block requests with no referrer or user agent:

RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule !^robots.txt$ - [F,L]

I'm trying to exclude robots.txt from this rule as at least one search engine makes the robots.txt request anonymously. However it's still being blocked:

193.***.115.6 - - [15/Jan/2004:04:11:44] "GET /robots.txt HTTP/1.1" 403 295 "-" "-"

What am I doing wrong?!?

[edited by: jdMorgan at 9:10 pm (utc) on Jan. 16, 2004]
[edit reason] Generalized specific IP address [/edit]

 

jdMorgan




msg:1496064
 8:16 pm on Jan 15, 2004 (gmt 0)

dcrombie,

It doesn't look like your code is broken. It may be that the user-agent is blocked due to some other reason.
You do need to escape the dot in robots.txt, and [L] used with [F] is redundant, but your code should have worked fine in this case (This looks almost like mine, which works).

RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule !^robots\.txt$ - [F]

I assume that you have other working mod_rewrite code in your .htaccess file, and that this problem is not systemic. If this code is in httpd.conf, you'll need to add a "/" ahead of "robots.txt".

Jim

dcrombie




msg:1496065
 11:06 am on Jan 16, 2004 (gmt 0)

Thanks - I've tried with and without escaping the dot with no apparent effect. My only thought is that the spider might be passing "-" instead of "" as the UA (they are Polish) so I'm going to try something like:

RewriteCond %{HTTP_USER_AGENT} ^-?$

[Edit - that really didn't make sense did it - they wouldn't be blocked in that case]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved