Forum Moderators: phranque

Message Too Old, No Replies

htaccess questions

Questions about how to write htaccess in connection to blocking bad robots

         

NanoChild

6:54 pm on Jun 2, 2006 (gmt 0)

10+ Year Member



Hi!
I have some questions about how to write correct in a htaccess file..!

I was about to block some bad robots and have found some names of bad robots on the Internet, but i recognized that not everybody write in exactly the same manner.

In the case of Express\ WebPictures\ (www.express-soft.com) you can see "www.express-soft.com" are inside () but in the case with larbin_2.6.2\ kabura@sushi.com, are "kabura@sushi.com" without "()" ..?

In another case "[]" are used instead of "()" ..?

I was wondering if it is necessary to write the last line, so instead of look like this:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures\ (www.express-soft.com) [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin_2.6.2\ kabura@sushi.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper\ [info@webreaper.net]
RewriteRule ^.* - [F,L]

it could look like this:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures\ [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin_2.6.2\ [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper\
RewriteRule ^.* - [F,L]

or maybe like this without the last "\"

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin_2.6.2 [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper
RewriteRule ^.* - [F,L]

Hope someone in here can help..!

Sincerely
Nano

jdMorgan

9:52 pm on Jun 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The key is to realize what the "\" is for. It is being used to escape various character, such as "(", ")" "[", "]" and spaces that otherwise would have special meaning to either the regular-expressions parser or to mod_rewrite itself.

You can shorten the patterns as you showed, making them less specific, but they should never ned with "\ ".

Jim

NanoChild

4:26 am on Jun 3, 2006 (gmt 0)

10+ Year Member



Hi and thx for your reply!

I have my own site running on a public server ( Apache/2.0.51 - Fedora)

I am using .htaccess to rewrite URLs and to block some bad robots and refererer sites..!

I have found 3 lists of bad robots and put them together in the .htaccess-file in the root of my site that validates perfect without any errors..!

So with the syntax in the example below..:

RewriteCond %{HTTP_USER_AGENT} ^larbin_2.6.2 (kabura@sushi.com) [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin_2.6.2 (larbin2.6.2@unspecified.mail) [OR]

...you can change it to this:

RewriteCond %{HTTP_USER_AGENT} ^larbin_2.6.2

...because the last info, (kabura@sushi.com) and (larbin2.6.2@unspecified.mail) just dont have any influence and are there just to inform..?

Sincerely
Nano

jdMorgan

1:33 am on Jun 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, the only difference is that by including that 'e-mail address' information, you make the rules more specific. If you include that part of the pattern, then larbin would be allowed from machines using any other e-mail addresses in the user-agent string.

I'd recommend blocking larbin in almost all cases, so just leave that part of the user-agent string out of the pattern.

Jim

NanoChild

12:58 am on Jun 5, 2006 (gmt 0)

10+ Year Member



Thanks!