Forum Moderators: open
Whatever, it's now in my Big List of Banned Bots.
.
P.S.
"West Wind Technologies Web Site, home of West Wind Web Connection [west-wind.com]"
-- is the beginning of the string? As in, oh...SetEnvIf User-Agent "^West\ Wind\ Internet\ Protocols" no_way
?
If the UA begins with "West" the following is simple and sufficient.
SetEnvIf User-Agent ^West no_way
Begins with West and followed by anything. or any number of character.
The same short method works on:
Web
Get
Down
Grab
Net
and I'm sure there are others.
KISS (Keep it simple and stupid)
I routinely use the nifty, one-word SetEnvIf User-Agent ^Example no_way, thanks, but just prefer to move from broad to narrow when first excluding any UA.
I would use a broader brush and drop the "^" from most of them too.
Why?
Because when they mutate it tends to change from:
"ExampleCrawler (http://examplecrawler.com)"
to:
"Mozilla/4.0 (compatible; ExampleCrawler; (http://examplecrawler.com))"
..or some such nonsense, so let the string float with the minimum user agent and you'll snare all combinations moving forward.
For example, I blocked a Boeing UA because I added EI to a
SetEnvIfNoCase User-Agent array. Oops. Using ^EI (and no IfNoCase:) made sure I narrowed the focus. (Here's [webmasterworld.com] a related thread.) Also, in the case of
^West, blocking sans ^ would mean the visitor from the UK using -- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; West Kent College; .NET CLR 1.1.4322)"
-- would've been more than a bit baffled.
You and Jim and countless others are a heckuva lot more experienced and efficient with all this stuff than I am, and in the context of your belts-and-suspenders programs and bot traps and what-have-yous, you're in great shape. I'm one of those folks still doing things manually so I'm a bit slower and a lot less efficient by default.
But one of these days (famous last words!) I'm going to sit down and revamp my primary .htaccess from the get-go and clean out loads of flooby-dust...
I still like to give more visitors than not the benefit of the doubt.
I give most visitors the benefit of a doubt too which is why I have a log of who I bounce so I can see if anything I've done is overly agressive.
Overall I block about 300-400 bogus sources out of 13K+ visitors/day.
Probably a few innocents snared now and then, but my blocked agent log files look pretty clean to me.
One thing I block is anything with "http://" in the agent but that's post filtering after all the allowed crawlers have been let into the site.
BTW, the "West Wind Internet Protocols" looks like it's programming tools to me, someone used their toolkit to crawl or link check. I run into this with lots of toolkits out there all the time.