Forum Moderators: phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list - Part 2

         

adriaant

11:46 pm on May 14, 2003 (gmt 0)

10+ Year Member



<modnote>
continued from [webmasterworld.com...]



UGH, bad typo in my original post. Here's the better version (I wasn't able to re-edit the older post?):

I'm trying to ban sites by domain name, since there are recently lots of reference spammers.

I have, for example, the rule:

RewriteCond %{HTTP_REFERER} ^http://(www\.)?.*stuff.*\.com/.*$ [NC]
RewriteRule ^.*$ - [F,L]

which should ban any sites containing the word "stuff"
www.stuff.com
www.whatkindofstuff.com
www.some-other-stuff.com

and so on.

However, it is not working, so I am sure I did not setup a proper pattern match rule. Anyone care to advise?

[edited by: jatar_k at 5:06 am (utc) on May 20, 2003]

jdMorgan

12:19 am on Jun 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



NancyB,

The problem is that you have an [OR] flag on your last RewriteCond. Since [OR]s only apply to RewriteConds, this is blowing up your RewriteRule.

Your syntax is otherwise fine as-is. Do not confuse prefix-matching syntax, as used in Redirect and Deny directives, with the extended regular expressions pattern-matching used by mod_rewrite. Beware of the many examples posted here with incorrect start and/or end anchoring. The "^" and "$" symbols have very specific functions, and adding or removing either of them will change the pattern-matching drastically.

The regular expressions tutorial linked from this Introduction to mod_rewrite [webmasterworld.com] post is quite helpful.

Sorry for any typos and terseness - typing in a hurry.

Jim

<edit>Oh, and make sure you have a space preceding the exclamation point in any RewriteRules or RewriteConds.</edit>

nancyb

4:58 am on Jun 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the quick response, Jim.

you have an [OR] flag on your last RewriteCond.

a space preceding the exclamation point

Both of those were sloppy copy/pasting - the last line actually was a different UA and didn't have the [OR] flag. I've no clue as to where the space before the "!" went as it's in the .htaccess

The RewriteRule is working for most of the conditions because I see them getting the 403, its just not working for those I mentioned.

two hours later .....

Whoo hoo - I just noticed that almost every line in my .htaccess ends with a space after the [OR] flag. I cleaned those up, maybe that's the problem. I'm really sick of looking at this file and now wonder how any of the conditions worked.

jdMorgan

5:29 am on Jun 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



nancyb,

> I've no clue as to where the space before the "!" went as it's in the .htaccess

This forum eats those spaces, but it's never clear when viewing copied code whether the poster had them in there and the forum ate them, or they were missing from the original code. To get the excalmation points to stay spaced when posting on WebmasterWorld, you have to use two spaces.

> The RewriteRule is working for most of the conditions because I see them getting the 403, its just not working for those I mentioned.

For those UAs I block and know about, your code is correct - including pattern anchoring. The pattern in the RewriteCond must be a letter-perfect match for the UA you see in your raw log files in order to work. Exceptions are when using the [NC] flag to make the compare case-insensitive, and of course, the use of regex wild-card characters or strings.

One other thing that messes things up is if an [OR] flag is missing on a RewriteCond line preceding one that appears to be broken.

I hope it was the spaces, 'causee otherwise, I'm stumped.

If you're sick of your .htaccess, you wouldn't want to see mine! - I do up to dozen UA's per RewriteCond. ;)

Jim

nancyb

5:44 am on Jun 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Since you know my code is correct, it must be the spaces. I'll post back here if it was and hopefully keep some other htaccess newbie from making the same mistake.

you wouldn't want to see mine! - I do up to dozen UA's per

eeeek, I'd be blind. Now I fully understand your water closet trick ;)

thanks for the tip about the "!" and spaces

yeah, I discovered what happens if you mess up an [OR] flag - every page went 500 when I dropped the final "]" on one of the conditions.

Oaf357

3:25 am on Jun 17, 2003 (gmt 0)

10+ Year Member



Added a new one:

RewriteCond %{HTTP_USER_AGENT} ^NASA\ Search\ 1\.0$ [NC,OR]

Went straight for the guestbook. It will suck down 403s now.

Does anyone have a "complete", up to date ban list that they could either post, sticky, or link to? I'd like to know what I'm up against. Everyday I add more bots to my list.

wkitty42

4:28 am on Jun 17, 2003 (gmt 0)

10+ Year Member


my bot list is rather large... i don't know how accurate it is, though... i can say that i don't have the problems, today, that i had a while back...

as for posting it, i'm not sure of the best way to make it available... i could link it from my site or i could just post it in a message... i'm sure there are plenty of corrections or optimizations that could be made to it, though... hummm...

ok, take it with the understanding that you have to determine what bots you want to allow access to your site... some of these i have blocked, you may want to allow on... others, you may want to block... i can't say that these are all-inclusive or that i haven't messed something up somewhere along the lines... also note that some of this and the associated comments are by others that have posted here and on other forums... i am thankful for their contributions but, sadly, i don't have any notes as to who they were ;-(

===== snip =====

Options +FollowSymLinks
RewriteEngine on
RewriteBase /

# this ruleset is to "stop" stupid attempts to use MS IIS expolits on us
# NIMDA
RewriteCond %{REQUEST_URI} /(cmd¦root¦shell)\.exe$[NC,OR]
RewriteCond %{REQUEST_URI} /(admin¦httpodbc)\.dll$[NC]
RewriteRule .* /cgi-bin/nonimda.cmd [L,E=HTTP_USER_AGENT:NIMDA_EXPLOIT,T=application/x-httpd-cgi]

# CODERED
RewriteCond %{REQUEST_URI} /default\.(ida¦idq)$[NC,OR]
RewriteCond %{REQUEST_URI} /.*\.printer$[NC]
RewriteRule .* /cgi-bin/nocode-r.cmd [L,E=HTTP_USER_AGENT:CODERED_EXPLOIT,T=application/x-httpd-cgi]

# this ruleset is for formmail script abusers...
RewriteCond %{REQUEST_URI} formmail\.(pl¦cgi)$[NC,OR]
RewriteCond %{REQUEST_URI} mailto\.(exe¦cgi)$[NC]
RewriteRule .* /cgi-bin/nofrmml.cmd [L,E=HTTP_USER_AGENT:FORMMAIL_EXPLOIT,T=application/x-httpd-cgi]

# Cyveillance is a spybot that scours the web for copyright violations and “damaging information” on
# behalf of clients such as the RIAA and MPAA. Their robot spoofs its User-Agent to look like Internet
# Explorer, and it completely ignores robots.txt. I have
# banned it by IP address.
RewriteCond %{REMOTE_ADDR} "^63\.148\.99\.2(2[4-9]¦[3-4][0-9]¦5[0-5])$"
RewriteRule .* - [F]

# There is another email harvester which always claims to be referred from http://www.iaea.org/.
# You may have seen this in your own referrer pages.
# I have banned it by referrer.
RewriteCond %{HTTP_REFERER} iaea\.org[NC]
RewriteRule .* - [F]

# NameProtect peddles their “online brand monitoring” to unsuspecting and gullible companies
# looking for people to sue. Despite the claims on their robot information page, they do not
# respect robots.txt; in fact, they spoof their User-Agent in multiple ways to avoid detection.
# I have banned them by User-Agent and IP address.
RewriteCond %{REMOTE_ADDR} ^12\.148\.196\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{REMOTE_ADDR} ^12\.148\.209\.(19[2-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{HTTP_USER_AGENT} NPBot[NC]
RewriteRule .* - [F]

# this ruleset is for unwanted useragents... possibly email harvesters
RewriteCond %{HTTP_USER_AGENT} ^[A-Z]+$[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.Browse\s[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.Eval[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.Surf [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Harvest [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*HTTrack [NC,OR]
# RewriteCond %{HTTP_USER_AGENT} ^.*libwww-perl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*LWP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*prospector[NC,OR]
RewriteCond %{HTTP_USER_AGENT} AsiaNetBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ASSORT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} attache [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ATHENS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} autohttp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} bew [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Bot\ mailto:craftbot@yahoo.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Bullseye [NC,OR]
RewriteCond %{HTTP_USER_AGENT} CherryPicker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ChinaClaw[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Crescent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} curl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} devsoft's\ http\ component [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Deweb[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Digimarc [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Digger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} digout4uagent[NC,OR]
RewriteCond %{HTTP_USER_AGENT} DIIbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DISCo[NC,OR]
RewriteCond %{HTTP_USER_AGENT} dloader(NaverRobot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Download\ Demon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} eCatch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ecollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Educate\ Search [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EmailCollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EmailSiphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EmailWolf[NC,OR]
RewriteCond %{HTTP_USER_AGENT} EO\ Browse [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Express\ WebPictures[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ExtractorPro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EyeNetIE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} fastlwspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} FEZhead[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Fetch[NC,OR]
RewriteCond %{HTTP_USER_AGENT} FlashGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Franklin\ Locator[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Full\ Web\ Bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Getleft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GetRight [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GetURL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GetWebPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Go!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Gozilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} go-ahead-got-it [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GrabNet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Grafula [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HMView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTML\ Works [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
# RewriteCond %{HTTP_USER_AGENT} ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} IBM_Planetwide [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Image\ Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Image\ Sucker[NC,OR]
RewriteCond %{HTTP_USER_AGENT} IncyWincy[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Industry\ Program[NC,OR]
RewriteCond %{HTTP_USER_AGENT} InterGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Internet\ Explore\ 5\.x [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Internet\ Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} InternetSeer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Irvine [NC,OR]
RewriteCond %{HTTP_USER_AGENT} JetCar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} JOC\ Web\ Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} KWebGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} larbin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} leech[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mass\ Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MCspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Microsoft\ URL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MIDown\ tool [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mirror [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Missauga\ Locator[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Missigua\ Locator[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mister\ PiX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Monster [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla.*NEWT[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla\/3\.0\.\+Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla\/3.Mozilla\/2\.01 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla\/4\.0$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozzilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MSIECrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Navroad [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NearSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetAnts [NC,OR]
RewriteCond %{HTTP_USER_AGENT} netattache [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetCarta [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetSpider[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NICErsPRO[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Octopus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Offline\ Explorer[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Offline\ Navigator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} OpaL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Openfind [NC,OR]
RewriteCond %{HTTP_USER_AGENT} OpenTextSiteCrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PackRat [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PageGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Papa\ Foto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} pavuk[NC,OR]
RewriteCond %{HTTP_USER_AGENT} pcBrowser[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Plucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Production\ Bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Program\ Shareware [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PushSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} RealDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ReGet[NC,OR]
RewriteCond %{HTTP_USER_AGENT} RepoMonkey [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Rover[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Rsync[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Siphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ScoutAbout [NC,OR]
RewriteCond %{HTTP_USER_AGENT} searchterms\.it [NC,OR]
RewriteCond %{HTTP_USER_AGENT} semanticdiscovery[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Shai [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sitecheck[NC,OR]
RewriteCond %{HTTP_USER_AGENT} SiteSnagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SmartDownload[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Spegla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SpiderBot[NC,OR]
RewriteCond %{HTTP_USER_AGENT} SuperBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SuperHTTP[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Surfbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SurfWalker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} tAkeOut [NC,OR]
RewriteCond %{HTTP_USER_AGENT} tarspider[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Teleport\ Pro[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Telesoft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Templeton[NC,OR]
RewriteCond %{HTTP_USER_AGENT} UtilMind [NC,OR]
RewriteCond %{HTTP_USER_AGENT} VoidEYE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} w3mir[NC,OR]
RewriteCond %{HTTP_USER_AGENT} web.by.mail [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebBandit[NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCopier[NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCopy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebEMailExtrac [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Image\ Collector[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebAuto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCopier[NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebFetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebMiner [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebReaper[NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebSauger[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Website\ eXtractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Website\ Quester [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebSnake [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebStripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webvac [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webwalk [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebWhacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebZIP [NC,OR]
# RewriteCond %{HTTP_USER_AGENT} wget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WhosTalking [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Widow[NC,OR]
RewriteCond %{HTTP_USER_AGENT} WUMPUS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} www\.pl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Xaldon\ WebSpider[NC,OR]
RewriteCond %{HTTP_USER_AGENT} XGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Zeus.*Webster[NC]
#RewriteCond %{HTTP_USER_AGENT} test[NC]
RewriteCond %{REQUEST_URI}!^/badUA\.html [NC]
RewriteRule .* /badUA.html [L,E=HTTP_USER_AGENT:BAD_USER_AGENT]

# this ruleset is to stop blank user agents with blank referrers
RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule .* /cgi-bin/noagent.cmd [L,T=application/x-httpd-cgi]

===== snip =====

there're quite a few in there... watch out for hosing your server... i got mine caught in endless loops several times while adjusting this from site wide (internal to httpd.conf) to per directory (.htaccess)... was glad i run my own server :wink:

a final note... watch for missing spaces... ther should be a space before every [ and the ¦ must be replaced by the verticle pipe on your keyboard... this site strips out extra spaces and tabs and replaces the split verticle pipe by a solid one... you'll have to watch these things...

FWIW: the above is taken directly, with no modification, from one of my main site .htaccess files... this site is live and online at this time with the above...

HTH

berli

6:34 pm on Jun 20, 2003 (gmt 0)

10+ Year Member



Tamsy wrote:

RewriteCond %{HTTP_REFERER} ^-?$ [NC]
RewriteCond %{HTTP_USER_AGENT} ^-?$ [NC]
RewriteRule .* - [F,L]

My question is, can ^$ safely replace ^-?$? I ask because I used cPanel to write part of my .htaccess file. To prevent hotlinking it denies gif, png's etc when the referrer is!^http://myserver,!^http://www.myserver and!^$.

Isn't!^$ the same as "-"? Or am I wrong?

jdMorgan

9:27 pm on Jun 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



berli,

^$ means "empty"

^-?$ means "may contain only a single '-' character, but the '-' character is not required." Or, in other words, "either blank or contains a single '-' character."

In the code posted above, we are looking for someone wishing to bypass a block for empty user-agent string by using a "-" character as their user-agent. In common log format, the log entry for a blank user-agent and a user-agent of "-" would appear identical.

So the code above blocks either blank user-agents, or "fake" blank user-agents.

Ref: [etext.lib.virginia.edu...]

HTH,
Jim

ladymindy

9:43 pm on Jul 4, 2003 (gmt 0)

10+ Year Member



Have any of you seen WebCapture 2.0? It was on my site today and is potentially a capture bot.

I run an htaccess file I developed from this thread. Thank you for such great sharing! It has saved me an amazing amount of bandwidth and headaches:)

jdMorgan

11:35 pm on Jul 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ladymindy,

Welcome to WebmasterWorld [webmasterworld.com]!

We've had a recent spotting of Webcature 3.0 - which may or may not be the same thing - over in the Search Engine Spider Identification forum [webmasterworld.com].

The fact that 3.0 doesn't fetch robots.txt is a bad sign...

Jim

This 122 message thread spans 13 pages: 122