Forum Moderators: open

Message Too Old, No Replies

.htaccess file failure

MSIECrawler was allowed in

         

misosoph

8:48 pm on Jul 25, 2002 (gmt 0)

10+ Year Member



This is what I -- like a lot of other WebMasterWorld members -- have in my .htaccess file:

SetEnvIf User-Agent ^MSIECrawler keep_out
order allow,deny
allow from all
deny from env=keep_out

But the following UA took an entire directory from my site:

"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; MSIECrawler)"

24 files - all code 200 - and all within 24 seconds. Here is one access-log line:

195.112.34.112 - - [Date] "GET /myfolder/etc.html HTTP/1.1" 200 37072 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; MSIECrawler)"

(195.112.34.112 belongs to Nildram Dynamic ADSL Accounts, UK)

Why did my .htaccess file fail?

bird

9:05 pm on Jul 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The "^" in your pattern matches the beginning of the search space. You would only get a hit with this if the UA actually started with "MSIECrawler ...", which it obviously doesn't.

misosoph

9:25 pm on Jul 25, 2002 (gmt 0)

10+ Year Member



Thank you. That was what I suspected.

Apparently I was wrong: I thought I had taken that line from one of the forum discussions, but now I can't find it anywhere. So I must have added it myself.

As I understand it, then, ^ means "begins with". So is there a formula can be used to block a request based on a word/phrase that appears anywhere at all in a UA?

Thank you again.

jdMorgan

9:52 pm on Jul 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes,

Leave the "^" off. You can anchor at the "end of phrase" with "$" if you like, i.e.

SetEnvIf User-Agent MSIECrawler)$ keep_out
order allow,deny
allow from all
deny from env=keep_out

Jim

misosoph

10:04 pm on Jul 25, 2002 (gmt 0)

10+ Year Member



There is a lot of knowledge wandering these halls. Thank you, Jim. (I'm going to have to take the time to study this myself. -- Soon, I hope). :)

jdMorgan

10:24 pm on Jul 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



misosoph,

Sure, no problem... Looking back at what I posted, I can't recommend adding the ")$" at the end
unless you test it thoroughly. The ")" might be interpreted as a special character, rather than a
literal, and that might break your deny again. Using Regular Expressions, you can "escape" the paren
by preceding it with a "\", but the allow,deny method does not use regular expressions in exactly
the same way that mod_rewrite does unless you force it to, and I use mod_rewrite. So, just leave
the "^" off the front end, and it will match anywhere in the string.

BTW, a few months ago, I did a search on "big G" for "regular expressions regex" and found a fairly
good primer on a .edu domain for sorting out all those hats and dollars (^$).

Cheers!
Jim