Welcome to WebmasterWorld Guest from 54.198.142.121

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list - Part 3

More tips and tricks for banning those pesky "problem bots!"

     
7:38 pm on Oct 13, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member txbakers is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 1, 2001
posts:4392
votes: 0


Continued from A close to perfect .htaccess ban list - Part 2 [webmasterworld.com]

Whee - what a great discussion.

[edited by: Marcia at 11:23 pm (utc) on Oct. 13, 2003]

[edited by: jdMorgan at 12:24 am (utc) on Nov. 19, 2003]
[edit reason] Corrected URL [/edit]

9:00 pm on Apr 1, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member powdork is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 13, 2002
posts:3347
votes: 0


Has anyone had issues or does anyone block this ip 193.***.234.? It's from Romania Data Systems

[edited by: jdMorgan at 1:27 am (utc) on April 17, 2004]
[edit reason] Obscured specifics [/edit]

brainstorm2k3

11:24 am on Apr 16, 2004 (gmt 0)

Inactive Member
Account Expired

 
 


thanks
This .htaccess is really great =)
3:10 pm on Apr 21, 2004 (gmt 0)

New User

10+ Year Member

joined:Apr 21, 2004
posts:3
votes: 0


Wow great thread!

I am a bit lost with all of this so I have a simple question about .htaccess syntax (yes, again, sorry), it will be fast:

Is the following command line correct for banning everything containing the string in question?

-> RewriteCond %{HTTP_USER_AGENT} ^.*WebZIP.*$ [OR]

And with spaces:

-> RewriteCond %{HTTP_USER_AGENT} ^.*Program\ Shareware.*$ [OR]

HUGE thanks if you can reply! :)

3:37 pm on Apr 21, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


> RewriteCond %{HTTP_USER_AGENT} ^.*WebZIP.*$ [OR]

It'll work, but you can shorten it. A start anchor "^" followed by ".*" and an end anchor "$" preceded by ".*" are redundant:

RewriteCond %{HTTP_USER_AGENT} WebZIP [OR]

Is entirely equivalent.

Ref: [etext.lib.virginia.edu...]

Jim

3:53 pm on Apr 21, 2004 (gmt 0)

Full Member

10+ Year Member

joined:May 5, 2003
posts:319
votes: 0


Max66 asked:
Is the following command line correct for banning everything containing the string in question?

-> RewriteCond %{HTTP_USER_AGENT} ^.*WebZIP.*$ [OR]

Max; You put in a lot more than is needed to ban WebZip agents. Here is all you really need to block any agent containing that string, case insensitive, anywhere in it's U-A string:

RewriteCond %{HTTP_USER_AGENT} webzip [NC,OR]

Note that I removed the ^.* and .$, as they are unnecessary. My example will catch that combination of letters, case insensitive, anywhere in the user agent string.

And with spaces:

-> RewriteCond %{HTTP_USER_AGENT} ^.*Program\ Shareware.*$ [OR]

Again, you have included more than is needed to block this UA. My version of this rule reads:

RewriteCond %{HTTP_USER_AGENT} ^Program.?Shareware [NC,OR], but you can also write it as: RewriteCond %{HTTP_USER_AGENT} ^Program\ Shareware [NC,OR] 

The ^ indicates the absolute beginning of a regexp string, while the $ sign means the absolute end. By leaving these out of the expression you allow for a match anywhere within the User Agent string. However, in my experience, Program Shareware is always the beginning of the name, so I anchor the beginning with a ^ but leave off the $, because there may be version numbers appended to it. Notice that I replaced you / with a .? where the space occurs. The reason for this is to allow for creative obfuscation by the users of these programs who might change the space to a dash or underscore, or even a forward slash, in the hopes of breaking our rules. The .? catches zero or more of any character(s) between Program and Shareware, including non-printing spaces. [NC] means No Case.

IMHO, Wiz

10:17 pm on Apr 21, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 4, 2002
posts:1958
votes: 0


Lots of great ideas here.

Anyone take a look at what a 30 line .htaccess does to your server load? Just a thought. I know that I have sites that I wouldn't put 30 lines of regular expressions into my php code for every page on a heavily loaded site, and this seems the same.

12:49 am on Apr 22, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Yes, it all depends on your site's traffic levels. Putting the code in httpd.conf is best performance-wise, since it gets compiled on server restart. Next-best is .htaccess, with care taken to make your rules selective enough that they don't run for *every* HTTP request (because the code in .htaccess is interpreted for each request). And after that, scripting languages are the third choice, because they are interpreted and are not native to the server software itself.

I've got a couple of sites which have up to 800-line .htaccess files, but because the rules are carefully written, and because the sites get thousands of hits per day instead of tens or hundreds of thousands (or more), they do just fine. The bottom line is that each server and hosted site is different, and you have to test to find out how big is too big for your CPU and your traffic level.

Taking a wider view, the point was made earlier that most sites won't need such a large, comprehensive set of rules, and each Webmaster should use only those rules which provide a real benefit to offset the performance loss they cause.

Jim

9:03 am on Apr 22, 2004 (gmt 0)

New User

10+ Year Member

joined:Apr 21, 2004
posts:3
votes: 0


Many thanks for the help!

So using this template

RewriteCond %{HTTP_USER_AGENT} string [NC,OR]

will ensure that every U-A containing "string" anywhere in the U-A will be banned?

Again, thanks a lot!

2:46 pm on Apr 22, 2004 (gmt 0)

Full Member

10+ Year Member

joined:May 5, 2003
posts:319
votes: 0


So using this template

RewriteCond %{HTTP_USER_AGENT} string [NC,OR]

will ensure that every U-A containing "string" anywhere in the U-A will be banned?

Correct-a-mundo, Max

Don't forget that if the User Agent contains non-alphabet characters or spaces you can put .? between the last letter of name one and the first letter of name two, with no space between the letters. For example: web.?extract [nc,or] (will catch "website extractor 1.09"), which otherwise would have to be written longhand as: ^Website\ Extractor\ 1\.09$ [or]. The long method would fail if somebody used version 1.10 instead of 1.09.

I personally group all common expressions in one long rule, separating each one with a vertical pipe symbol (which is displayed as a broken pipe on this forum, ala: ¦). Here is one such grouped condition from my .htaccess:

RewriteCond %{HTTP_USER_AGENT} ^(BlackWidow¦Crescent¦Disco.?¦ExtractorPro¦HTML.?Works¦Franklin.?Locator¦
Green\ Research¦Harvest¦HLoader¦http.?generic¦Industry.?Program¦IUPUI.?Research.?Bot¦Mac.?Finder¦NetZIP¦
NICErsPRO¦NPBot¦PlantyNet_WebRobot¦Production.?Bot¦Program.?Shareware¦Teleport.?Pro¦TurnitinBot¦TE¦
VoidEYE¦WebBandit¦WebCopier¦Websnatcher¦Website\ Extractor¦WEP.?Search¦Wget¦Zeus) [NC,OR]

Notice that the board has changed my pipes into broken vertical pipes, so you would have to re-type them correctly to use this group rule. The line of User Agents is anchored at the beginning with a ^, because these UAs are known to display in logs as typed, but there is no ending $ anchor. This allows for other characters after the main name, such as version numbers. I have another group rule that is not anchored at the beginning to catch strings that may not be at the beginning of a UA.

These represent my personal choice of which agents to block with a 403 message, and may not apply to other people.

Wiz

[edited by: jdMorgan at 3:08 pm (utc) on April 22, 2004]
[edit reason] Edited long line to fix horizontal scrolling [/edit]

8:27 am on Apr 23, 2004 (gmt 0)

New User

10+ Year Member

joined:Apr 21, 2004
posts:3
votes: 0


Thanks to all for your precious answers!
This 80 message thread spans 8 pages: 80
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members