Forum Moderators: phranque

Message Too Old, No Replies

Blocking of Robots

Using mod_rewrite to block nasty spiders

         

thewormman

4:33 pm on May 26, 2005 (gmt 0)

10+ Year Member



First post, be gentle...

I am new to mod_rewrite and am trying to get to grips with it.
I am trying to block certain robots from retrieving pages of specific directories.

I have tried using the following, I did not write this myself:

RewriteCond %{HTTP_USER_AGENT} ^NameOfBadRobot.*
RewriteCond %{REMOTE_ADDR} ^123\.45\.67\.[8-9]$
RewriteRule ^/info/somedirectory/.+ - [F]

But cannot get it to work, it does nothing.

Can anyone see anything obvious that maybe wrong?

Many thanks

jdMorgan

9:15 pm on May 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



thewormman,

Welcome from WebmasterWorld!

There is a lot of minor stuff wrong with this code, and maybe more. I'd suggest you check out the references cited in our forum charter [webmasterworld.com].


RewriteCond %{HTTP_USER_AGENT} ^NameOfBadRobot.*
RewriteCond %{REMOTE_ADDR} ^123\.45\.67\.[8-9]$
RewriteRule ^/info/somedirectory/.+ - [F]

First, a trailing ".*" on an unanchored string is meaningless.

Next, if NameOfBadRobot is start-anchored with "^" as shown, then the user-agent name must *start* with that string exactly.

Next, the RewriteRule will only be invoked if BOTH RewriteConds match. In other words, it will block that robot only if it comes from that remote IP address range -- Is that what you want? If not see the RewriteCond [OR] flag.

The alternate group [89] is equivalent to [8-9], since the numbers are contiguous. The shorter form is slightly faster.

Finally, the trailing "+" on the RewriteRule pattern isn't needed, either; You could just end the pattern with the period. (Note: Either way, this says to let the "index file" at "/" in that directory be spidered. It will block access if any characters follow "/info/somedirectory/". If you also want to block access to the index file at "/info/somedirectory/", then remove that trailing period.)

Fixing the minor stuff and leaving anchoring and robot specifics unanswered, we get:


RewriteCond %{HTTP_USER_AGENT} ^NameOfBadRobot
RewriteCond %{REMOTE_ADDR} ^123\.45\.67\.[89]$
RewriteRule ^/info/somedirectory/. - [F]

Very little of the above is likely to make sense unless you followed the link above... :)

Jim

thewormman

10:06 pm on May 26, 2005 (gmt 0)

10+ Year Member



WOW!
Thanks for the detailed answer.

>>>I'd suggest you check out the references cited in our forum charter.<<<

OK will read that in detail

>>>Next, the RewriteRule will only be invoked if BOTH RewriteConds match. In other words, it will block that robot only if it comes from that remote IP address range -- Is that what you want?<<<

Yes that is what I wanted

>>>The alternate group [89] is equivalent to [8-9], since the numbers are contiguous. The shorter form is slightly faster.<<<

Sorry should have put say, [3-9], the intent was to block a specific range. But your point is interesting to know!

Many, many, thanks for this info, off to try it out and do some more reading!

physics

10:51 pm on May 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi thewormman. Welcome (again) to WebmasterWorld!
You should also check out:
A Close to Perfect .htaccess Ban List [webmasterworld.com] and also parts 2 and 3 of that thread.

thewormman

9:57 am on Jun 1, 2005 (gmt 0)

10+ Year Member



Ok
Done lots of reading and got it working as this

RewriteCond %{HTTP_USER_AGENT} ^NameOfBadRobot
RewriteCond %{REMOTE_ADDR} ^123\.45\.67\.3[0-9]$
RewriteRule ^info/somedirectory/. - [F]

This is blocking an IP range 123.45.67.30 to 39

But how can I block a bigger range say 123.45.60 to 123.45.69?

I have tried putting 123.45.6[0-9] and leaving the last numbers off but without the last ones it does not work. Do I have to add something?

Sorry to ask but I have read loads and can't find any alternatives and my brain really hurts...

Thanks.

Romeo

10:27 am on Jun 1, 2005 (gmt 0)

10+ Year Member



Try leaving the last numbers off including the "$".
The "$" means that the expression ends here exactly as specified. Just leave that open.

Regards,
R.

jdMorgan

3:00 pm on Jun 1, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To be clear, the line should read:

RewriteCond %{REMOTE_ADDR} ^123\.45\.6[0-9]\.

This will match on valid IP addresses 123.45.60.0 through 123.45.69.255

The trailing "\." is not strictly required in this case; It is used to prevent ambiguity between, say
123.45.10.0 and 123.45.100.255, both of which would match the pattern "^123.45.10"

Always keep in mind that mod_rewrite is doing a lexical compare, not a numerical evaluation; It's only looking at REMOTE_ADDR as a string of characters, not as numbers.

Jim

thewormman

4:36 pm on Jun 1, 2005 (gmt 0)

10+ Year Member



Thanks guys!

Easy when you know how!