Welcome to WebmasterWorld Guest from 54.226.27.104

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

A Close to Perfect .htaccess ban list - Revisited

     

wilderness

3:33 pm on Jan 9, 2007 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



There's a 2006 renewal of this thread, however the thread is closed:

[webmasterworld.com...]

SetEnvIfNoCase User-Agent "^Web\ Image\ Collector" bad_bot
SetEnvIfNoCase User-Agent "^Web\ Sucker" bad_bot
SetEnvIfNoCase User-Agent "^WebAuto" bad_bot
SetEnvIfNoCase User-Agent "^WebBandit" bad_bot
SetEnvIfNoCase User-Agent "^Webclipping.com" bad_bot
SetEnvIfNoCase User-Agent "^WebCopier" bad_bot
SetEnvIfNoCase User-Agent "^WebEMailExtrac.*" bad_bot
SetEnvIfNoCase User-Agent "^WebEnhancer" bad_bot
SetEnvIfNoCase User-Agent "^WebFetch" bad_bot
SetEnvIfNoCase User-Agent "^WebGo\ IS" bad_bot
SetEnvIfNoCase User-Agent "^Web.Image.Collector" bad_bot
SetEnvIfNoCase User-Agent "^WebLeacher" bad_bot
SetEnvIfNoCase User-Agent "^WebmasterWorldForumBot" bad_bot
SetEnvIfNoCase User-Agent "^WebReaper" bad_bot
SetEnvIfNoCase User-Agent "^WebSauger" bad_bot
SetEnvIfNoCase User-Agent "^WebSite" bad_bot
SetEnvIfNoCase User-Agent "^Website\ eXtractor" bad_bot
SetEnvIfNoCase User-Agent "^Website\ Quester" bad_bot
SetEnvIfNoCase User-Agent "^Webster" bad_bot
SetEnvIfNoCase User-Agent "^WebStripper" bad_bot
SetEnvIfNoCase User-Agent "^WebWhacker" bad_bot
SetEnvIfNoCase User-Agent "^WebZIP" bad_bot

hybrid,
You may reduce twenty-one lines down to a single line with the following:

SetEnvIfNoCase User-Agent ^Web bad_bot

Don

Quadrille

1:24 pm on Jan 10, 2007 (gmt 0)

WebmasterWorld Senior Member quadrille is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Could you post the entire code required, for us non-tech people?

Thanks!

g1smd

1:34 pm on Jan 10, 2007 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



>> You may reduce twenty-one lines down to a single line with the following <<

Wouldn't that then block any bot with the word "Web" in the UA string?
That isn't what was intended.

jdMorgan

3:04 pm on Jan 10, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Actually, it likely *is* what was intended. wilderness may be overrun with scrapers, and so may choose to run a very tight ship, blocking all UAs with "Web" in them, and making exceptions only as required.

This may not work for some Webmasters, but every Webmaster should choose what to allow and what to block based on their site demographics and what they see in their logs/stats; There is no one "right answer" for everyone.

Jim

g1smd

3:24 pm on Jan 10, 2007 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



If that were the case, I would block all but Google, Yahoo, MSN, ASK, archive.org, and a select few others. :-)

wilderness

8:41 pm on Jan 10, 2007 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



[quote]>> You may reduce twenty-one lines down to a single line with the following <<

Wouldn't that then block any bot with the word "Web" in the UA string?
That isn't what was intended.[quote]

NO!

It denys access to UA's that begin with thos three letters/charcters.

Also the quotation marks in all of hybrid's lines are redundant and not necessary.
Even though as fiestagirl pointed out some time ago that Apache Rewrites advises using quotes around UA's.

I have less than a handful of quotes surrounding UA's and use that option sparringly as when confined in quotes; it implys exactly as.
Thus when using quotes you are not even required to use the backslash/escape character for blank spaces.

Jim is much more knowlegeable about these procedures than myself. The majority of what I've learned regarding Rewrites has been through the benefit of long participation in forum 11 (I rarely venture to other forums of Webmaster World).
I thrive to keep my rewrite lines simple and easy to understand.

I recall a good friend telling me privately that it was necessary for he to include remarks in his rewrites so that he was able to determine later precisely what the rewrite accomplished :)
Except for a very few rewrites that have been provided to me by the same friend to solve what I term as complicated issues, my rewrites are basic and simple.
My friend has the capability to jump through hoops that I wouldn't even survey ;)

Don


wilderness

9:33 pm on Jan 10, 2007 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Could you post the entire code required, for us non-tech people?

A simple beginning
[webmasterworld.com...]

Valid Search Engine?
[webmasterworld.com...]

IIS and Global.asa
[w3schools.com...]

dbm Maps
[webmasterworld.com...]

Reduce harvests
[webmasterworld.com...] Msg#16

Throttle runaways
[webmasterworld.com...]

Block Methods (Scroll past opening Advertisements)
[diveintomark.org...]

Regular Expressions
[etext.lib.virginia.edu...]
[gnosis.cx...]

Close To Perfect I
[webmasterworld.com...]
Close To Perfect II
[webmasterworld.com...]
Close To Perfect III
[webmasterworld.com...]

Concise htaccess
[webmasterworld.com...]

robots.text on a diet
[webmasterworld.com...]

Search Tools
[webmasterworld.com...]

In addition somebody either in this forum or forum 11 provide a very long explantion recentlt on this same issue.
I recall viewing the very nice composition, however unable to find URL or recall name of user.

Quadrille

12:38 am on Jan 11, 2007 (gmt 0)

WebmasterWorld Senior Member quadrille is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Wow, that's amazing. Thanks.

I'd better get reading!

~Q

 

Featured Threads

Hot Threads This Week

Hot Threads This Month