homepage Welcome to WebmasterWorld Guest from 54.204.90.135
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
A Close to Perfect .htaccess ban list - Revisited
wilderness




msg:3213391
 3:33 pm on Jan 9, 2007 (gmt 0)

There's a 2006 renewal of this thread, however the thread is closed:

[webmasterworld.com...]

SetEnvIfNoCase User-Agent "^Web\ Image\ Collector" bad_bot
SetEnvIfNoCase User-Agent "^Web\ Sucker" bad_bot
SetEnvIfNoCase User-Agent "^WebAuto" bad_bot
SetEnvIfNoCase User-Agent "^WebBandit" bad_bot
SetEnvIfNoCase User-Agent "^Webclipping.com" bad_bot
SetEnvIfNoCase User-Agent "^WebCopier" bad_bot
SetEnvIfNoCase User-Agent "^WebEMailExtrac.*" bad_bot
SetEnvIfNoCase User-Agent "^WebEnhancer" bad_bot
SetEnvIfNoCase User-Agent "^WebFetch" bad_bot
SetEnvIfNoCase User-Agent "^WebGo\ IS" bad_bot
SetEnvIfNoCase User-Agent "^Web.Image.Collector" bad_bot
SetEnvIfNoCase User-Agent "^WebLeacher" bad_bot
SetEnvIfNoCase User-Agent "^WebmasterWorldForumBot" bad_bot
SetEnvIfNoCase User-Agent "^WebReaper" bad_bot
SetEnvIfNoCase User-Agent "^WebSauger" bad_bot
SetEnvIfNoCase User-Agent "^WebSite" bad_bot
SetEnvIfNoCase User-Agent "^Website\ eXtractor" bad_bot
SetEnvIfNoCase User-Agent "^Website\ Quester" bad_bot
SetEnvIfNoCase User-Agent "^Webster" bad_bot
SetEnvIfNoCase User-Agent "^WebStripper" bad_bot
SetEnvIfNoCase User-Agent "^WebWhacker" bad_bot
SetEnvIfNoCase User-Agent "^WebZIP" bad_bot

hybrid,
You may reduce twenty-one lines down to a single line with the following:

SetEnvIfNoCase User-Agent ^Web bad_bot

Don

 

Quadrille




msg:3214622
 1:24 pm on Jan 10, 2007 (gmt 0)

Could you post the entire code required, for us non-tech people?

Thanks!

g1smd




msg:3214639
 1:34 pm on Jan 10, 2007 (gmt 0)

>> You may reduce twenty-one lines down to a single line with the following <<

Wouldn't that then block any bot with the word "Web" in the UA string?
That isn't what was intended.

jdMorgan




msg:3214731
 3:04 pm on Jan 10, 2007 (gmt 0)

Actually, it likely *is* what was intended. wilderness may be overrun with scrapers, and so may choose to run a very tight ship, blocking all UAs with "Web" in them, and making exceptions only as required.

This may not work for some Webmasters, but every Webmaster should choose what to allow and what to block based on their site demographics and what they see in their logs/stats; There is no one "right answer" for everyone.

Jim

g1smd




msg:3214756
 3:24 pm on Jan 10, 2007 (gmt 0)

If that were the case, I would block all but Google, Yahoo, MSN, ASK, archive.org, and a select few others. :-)

wilderness




msg:3215236
 8:41 pm on Jan 10, 2007 (gmt 0)

[quote]>> You may reduce twenty-one lines down to a single line with the following <<

Wouldn't that then block any bot with the word "Web" in the UA string?
That isn't what was intended.[quote]

NO!

It denys access to UA's that begin with thos three letters/charcters.

Also the quotation marks in all of hybrid's lines are redundant and not necessary.
Even though as fiestagirl pointed out some time ago that Apache Rewrites advises using quotes around UA's.

I have less than a handful of quotes surrounding UA's and use that option sparringly as when confined in quotes; it implys exactly as.
Thus when using quotes you are not even required to use the backslash/escape character for blank spaces.

Jim is much more knowlegeable about these procedures than myself. The majority of what I've learned regarding Rewrites has been through the benefit of long participation in forum 11 (I rarely venture to other forums of Webmaster World).
I thrive to keep my rewrite lines simple and easy to understand.

I recall a good friend telling me privately that it was necessary for he to include remarks in his rewrites so that he was able to determine later precisely what the rewrite accomplished :)
Except for a very few rewrites that have been provided to me by the same friend to solve what I term as complicated issues, my rewrites are basic and simple.
My friend has the capability to jump through hoops that I wouldn't even survey ;)

Don



wilderness




msg:3215280
 9:33 pm on Jan 10, 2007 (gmt 0)

Could you post the entire code required, for us non-tech people?

A simple beginning
[webmasterworld.com...]

Valid Search Engine?
[webmasterworld.com...]

IIS and Global.asa
[w3schools.com...]

dbm Maps
[webmasterworld.com...]

Reduce harvests
[webmasterworld.com...] Msg#16

Throttle runaways
[webmasterworld.com...]

Block Methods (Scroll past opening Advertisements)
[diveintomark.org...]

Regular Expressions
[etext.lib.virginia.edu...]
[gnosis.cx...]

Close To Perfect I
[webmasterworld.com...]
Close To Perfect II
[webmasterworld.com...]
Close To Perfect III
[webmasterworld.com...]

Concise htaccess
[webmasterworld.com...]

robots.text on a diet
[webmasterworld.com...]

Search Tools
[webmasterworld.com...]

In addition somebody either in this forum or forum 11 provide a very long explantion recentlt on this same issue.
I recall viewing the very nice composition, however unable to find URL or recall name of user.

Quadrille




msg:3215467
 12:38 am on Jan 11, 2007 (gmt 0)

Wow, that's amazing. Thanks.

I'd better get reading!

~Q

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved