homepage Welcome to WebmasterWorld Guest from 54.204.73.126
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

This 243 message thread spans 9 pages: < < 243 ( 1 2 3 4 5 [6] 7 8 9 > >     
A Close to perfect .htaccess ban list
toolman

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 3:30 am on Oct 23, 2001 (gmt 0)

Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

 

pmkpmk

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 10:04 am on Oct 11, 2002 (gmt 0)

Hi Dave,

no, I have root access on my own server, which physically resides about 7m from where I am right now :-)

Excerpts from httpd.conf:

LoadModule rewrite_module /usr/lib/apache/mod_rewrite.so
AddModule mod_access.c
<VirtualHost a.b.c.d>
Options +FollowSymLinks
</VirtualHost>

Excerpts of .htaccess

XBitHack on
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]

Error messages in Logfile:

error.log:[Wed Sep 11 11:23:27 2002] [error] [client x.y.z.z] Options FollowSymLinks or SymLinksIfOwnerMatch is off which implies that RewriteRule directive is forbidden: /usr/local/httpd/virtual/....

carfac

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 3:16 pm on Oct 11, 2002 (gmt 0)

pmkpmk:

Yep- looks like it is not enabled. Ask your ISP to add "allowoverride all" for that directory and you should be OK!

dave

Crazy_Fool

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 9:09 am on Oct 12, 2002 (gmt 0)

>>Actually my primary goal is to block adress harvesters. I don't
>>care (yet) for people downloading the whole site. But we really
>>need to get a lid on this SPAM.

same here. i'm using this file to redirect bad bots and email harvesters to a page with a list of spammers email addresses (their real email addresses, not the yahoo or hotmail addresses they send spam from). the harvesters will pick these up and spammers will end up spamming each other. if enough people do this, then eventually we could stop a lot of spam.

andreasfriedrich

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 10:49 pm on Oct 12, 2002 (gmt 0)

If you have root access you might want to check out the alternative approach described in the thread on How to centralize administration of things to block [webmasterworld.com].

Andreas

Superman

5+ Year Member



 
Msg#: 687 posted 6:10 am on Oct 14, 2002 (gmt 0)

Found a new site downloader tonight:

Irvine/0.4.5a

Japanese offline browser ... multiple versions.

RewriteCond %{HTTP_USER_AGENT} ^Irvine [OR]

That'll take care of it!

-Superman-

pmkpmk

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 7:31 am on Oct 16, 2002 (gmt 0)

carfac:

I *AM* in this case the ISP - where is the "allowoverride" directive to be placed?

dingman

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 7:49 am on Oct 16, 2002 (gmt 0)

Stick it in the <Directory> or <VirtualHost> (or whatever) that defines the site you're working with. In the case of your posted snippet, try:

<VirtualHost a.b.c.d>
AllowOverride All
Options +FollowSymLinks
</VirtualHost>

andreasfriedrich

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 1:17 pm on Oct 16, 2002 (gmt 0)

You don´t need the AllowOverride [httpd.apache.org] directive if you specify Options +FollowSymLinks in the configuration file itself. AllowOverride is only used to specify which settings are allowed to me made in .htaccess files.

This is a different situation than the one in Msg #83 [webmasterworld.com] where FollowSymLinks needed to be enabled in the .htaccess file. For that to work one needs to have at least AllowOverride Options privileges.

If you have root access I would opt for Allowoverride None to turn htaccess files off entirely. You can do the configuration in the main configuration file. This saves Apache lots of stat calls to check for .htaccess files. And you won´t need the FollowSymLink Option at all since it is only neccessary in the per directory context.

Andreas

andreasfriedrich

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 1:34 pm on Oct 16, 2002 (gmt 0)

BTW there is something strange going on in the configuration described in Msg #153 [webmasterworld.com]

Given a config as this

<VirtualHost a.b.c.d>
Options +FollowSymLinks
</VirtualHost>

and assuming the requested URI resides on the virtual host a.b.c.d I find it rather strange that Apache would complain that Options FollowSymLinks is off since it is clearly enabled.

Could it be that the requested URI is not on this virtual server but somewhere else on your server?

Andreas

58sniper

10+ Year Member



 
Msg#: 687 posted 3:59 pm on Oct 16, 2002 (gmt 0)
I have a question about the order in which things should appear in .htaccess....

I have:
=====================================================
[b]# Error docs[/b]
ErrorDocument 401 /error.php?eid=401
....
ErrorDocument 500 /error.php?eid=500

[b]# RedirectPermanent for the old format to the current format (probably to be removed in favor of the search engine friendly URLs)[/b]
RedirectPermanent /divisions/comet http://www.mydomain.com/article.php?aid=25
....
RedirectPermanent /wanted http://www.mydomain.com/section.php?sid=wanted

[b]# stop the image thieves[/b]
RewriteEngine on
RewriteCond %{HTTP_REFERER}!^$
RewriteCond %{HTTP_REFERER}!^http://(www\.)?mydomain.com.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://(dev\.)?mydomain.com.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://localhost/.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://12.34.5.(6*¦7*)$ [NC]
RewriteRule \.(gif¦jpg¦zip¦pdf)$ http://www.mydomain.com/apology.gif [R,L]

[b]# Search engine friendly URLs[/b]
RewriteRule ^articles/([0-9]*) /article.php?aid=$1 [L]
....
RewriteRule^sheriff /article.php?aid=22 [L]

[b]# RewriteCond for those annoying UAs[/b]
RewriteCond %{HTTP_USER_AGENT} almaden [OR]
....
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.*$ /robots.php [L]
=====================================================
I'm curious as to if this is the best order?

bull

10+ Year Member



 
Msg#: 687 posted 7:48 pm on Oct 19, 2002 (gmt 0)

After visits from a s*x-spambot abusing goggle
(acb12246.ipt.aol.com - - [11/Oct/2002:14:08:51 +0200] "GET /mypoorpage.htm HTTP/1.1" 200 9075 www.mydomain.net "http://www.google.de/search?q=Guestbook+Jewel&num=100...start=400&sa=N" "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)" "-" )
i inserted the following 2 lines. there's imho no real reason to search for guestbook, except for spambots.

RewriteCond %{HTTP_REFERER} q=guestbook [NC,OR]
RewriteCond %{HTTP_REFERER} q=g%E4stebuch [NC,OR]

works for msn search also.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 687 posted 8:58 pm on Oct 19, 2002 (gmt 0)

58sniper,

Since your rewrite rules all end with the [L] flag, it doesn't really matter what order you put them in. You can choose to do the "good guys" rewrites first, or the "bad guys" first (depending on how many of each you get) in order to speed up (slightly) the majority of requests.

Only when the output from one rewrite needs to be processed by subsequent rules does the order matter all that much. In that case, you wouldn't be using the [L] flag on each ruleset.

Bull,

That's an interesting exploit - I haven't seen that one yet, but I'll keep an eye out!

Jim

58sniper

10+ Year Member



 
Msg#: 687 posted 5:58 pm on Oct 24, 2002 (gmt 0)

I have an issue....

Seems the site flipdog.com has been snatching my content. Additionally, they've been doing a pretty crappy job at displaying it. (and the fact that many things on my site have changed since they've archived things, it only makes it worse) I'd like to prevent any requests from flipdog.com. This includes requestes for files from the content they've already archived, as well as any attempts to get new content.

Can someone tell me if this would work:

RewriteCond %{HTTP_REFERER} ^http://(www\.)?flipdog.com/*$ [NC]
RewriteRule ^.* /robots.php [R,L]

I already have a RewriteRule in place to protect images and other files, and that's working fine. But I'd like to prevent them from getting anything in the future. I don't see what UA they are using, so I tend to believe that they are masking that.

Should I be looking at something else to block them as well?

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 687 posted 6:55 pm on Oct 24, 2002 (gmt 0)

58sniper,

The "/*$" at the end of your RewriteCond pattern isn't quite right. Just leave the "$" end anchor off to match anything that starts with the pattern. Escape the 2nd period with "\" too.

I don't see any point in redirecting to your robots.txt. How about just a 403 response? Also, neither anchor is needed in the RewriteRule pattern - just ".*" will do.

RewriteCond %{HTTP_REFERER} ^http://(www\.)?flipdog\.com [NC]
RewriteRule .* - [F,L]

Returns a 403-Forbidden response and no content.

HTH,
Jim

58sniper

10+ Year Member



 
Msg#: 687 posted 7:28 pm on Oct 24, 2002 (gmt 0)

Actually, I'm not redirecting to my robots.txt file, but to a file robots.php, which has some content on it.

I'll try your suggestions...

Thanks!

58sniper

10+ Year Member



 
Msg#: 687 posted 2:01 pm on Oct 25, 2002 (gmt 0)
Okay, I've determined that the UA for FlipDog is

"Mozilla/4.7 (compatible; FlipDog; http://www.whizbang.com/crawler)"

in case anyone else wants to block this as well.

RewriteCond %{HTTP_USER_AGENT} ^FlipDog [OR]

should work......

121focus

10+ Year Member



 
Msg#: 687 posted 11:52 pm on Oct 25, 2002 (gmt 0)

58sniper,

The "^" at the beginning of your RewriteCond pattern isn't quite right. You might try removing the "^"

RewriteCond %{HTTP_USER_AGENT} FlipDog [OR]

You can test any new RewriteCond using WannaBrowser see Message #63 [webmasterworld.com]

dhdweb

10+ Year Member



 
Msg#: 687 posted 9:29 pm on Oct 26, 2002 (gmt 0)

Here is the UA list from my site, who do I need to worry about here?

Explorer¦5
Explorer¦6
sitecheck.internetseer.com (For more info see: http:¦x
Googlebot¦2
Netscape¦6
-
Netscape¦4
FAST WebCrawler¦3
Netscape¦3
Netscape¦2
(unknown)
ia_archiver
Explorer¦4
Openfind data gatherer, Openbot¦3
bumblebee¦1
Mercator 2.0
Libby_1.1¦x
Scooter 3.2.FNR
Explorer¦x
Robozilla¦1
Scooter 3.2.EX
Scooter¦3
PingALink Monitoring Services 1.0 (http:¦x
libwww perl¦5
Lachesis
Mozilla
TurnitinBot¦1
ah ha.com crawler (crawler@ah ha.com)
NationalDirectory WebSpider¦1
ScoutAbout
Pompos¦1
Gigabot¦1
NG¦1
Szukacz¦1
Opera¦6
TulipChain¦5
Scrubby¦2
AaronCarter¦1
curl¦7 4
oBot 4
Internet Explore 5.x
Microsoft URL Control 6.00.8862
Scooter 3.2
FreeFind.com SiteSearchEngine¦1
Java1.4.0
OneStop Webmaster; http:¦x
SlySearch¦1
appie 1.1 (www.walhello.com)
our agentlibwww perl¦5
Scooter ARS 1.1
Snoopy v0.1
Wget¦1
NetResearchServer¦2
Xenu Link Sleuth 1.2a
(Teradex Mapper; mapper@teradex.com; http:¦x
Microsoft URL Control 6.00.8169
Jonzilla¦6
Teleport Pro¦1
Java1.3.0
IE 5.5 Compatible Browser
Generic
Scooter 3.2.QA
metacarta (crawler@metacarta.com)
Scooter 3.2.SF0
psbot¦0
Python urllib¦1
ASPSeek¦1
Steeler¦1
Java1.3.1
W3C_Validator¦1
b2w¦0
LinkWalker
HitboxDoctor
rico¦0
Java1.3.1_02
Mewsoft Search Engine wwWebmasterWorldsoft.com¦4
asterias¦2
pavuk¦0
ColdFusion
Xenu_s Link Sleuth 1.1c
minibot
Rex Swain_s HTTP Viewer (http:¦x
Gulliver¦1
Unknown
lwp request¦2
moget¦2
Snoopy v0.94
Scooter 3.2.SB
Scooter 3.2.BT
COAST WebMaster (Windows NT)
DISCo Pump 3.2
Zeus 2895 Webster Pro V2.9 Win32
WebZIP¦5
dloader(NaverRobot)¦1
AbachoBOT (Mozilla compatible)
IP*Works! V5 HTTP¦x
MFC_Tear_Sample
NetMechanic
A WinHTTP Example Program¦1
HttpApp¦1
antibot V1.1.9¦x
http:¦x
PHP¦4
NetMechanic Page Primer
Robot: NutchCrawler, Owner: wdavies@acm.org
Sqworm¦2
Vagabondo¦2
rabaz (rabaz at gigabaz dot com)
EyeNetIE
Spinne¦2
OrangeBot
Linkbot 3.0
Net Probe

Listed here in order hits.

garwk

10+ Year Member



 
Msg#: 687 posted 4:31 am on Nov 13, 2002 (gmt 0)

hello,

I doubt the use of Rewrite rules is the adequate solution to keep robots away from your files. It consumes a lot of CPU power on the server (esp. with these large lists) and still doesn't do the job very well. I have downloaded entire sites myself for several reasons and know how easy it is to fake the UA or to avoid per User traffic/max. connection limits by using a couple of proxies.

Why don't you use JacaScript for this? Most sites require it anyway. This way you can keep all the bots away from you files if you use a JavaScript Funktion to generate the URLs instead of using static URLs for links or images.

In fact I'm using this approach on my site for some time now and it works perfectly. Since this I didn't find a single download bot access pattern in my logs anymore. Apparently none of these tools are able to process Javascript.

andreasfriedrich

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 5:01 am on Nov 13, 2002 (gmt 0)

Welcome to WebmasterWorld [webmasterworld.com], garwk.

Well, I guess it´s a tradeoff between security on the one hand and usability on the other hand.

If you use a JavaScript function to generate URLs on the client side you will shut out all users with Javascript turned off. And you would still need a way to make your site accessable to the SE spiders. Unless you use IP based cloaking for that a user would still be able to pose as some SE spider.

While the .htaccess approach has its shortcomings I´m still not convinced that a client side JavaScript solution would be any better.

I guess just like you can´t prevent people from copying a book you cannot prevent them from copying your website. The only thing you can do is make it harder for them.

Andreas

pmkpmk

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 11:35 am on Nov 15, 2002 (gmt 0)

Unfortunately I still have it not working (see earlier posts), but I hadn't had the time yet to look into it further (fate of a part-time-webmaster).

Bute there's a new issue: anyobody ever heard of a user-agent calling itself "GraphicBrain.com"?

This special agent seems to download the whole site (which - in theory - I don't mind) but it produces such long logfile entries that my logfile analyzer crashes :-(

Example of ONE(!) logfile line:
212.113.xx.yy - - [01/Aug/2002:06:13:40 +0200] "GET / HTTP/1.0" 200 7931 "-" "GraphicBrain.com" "visitid=3D48ADCB000031EB604E6FEE; KeyWordCookie=GIFTS%2CFLOWERS%2CTRAVEL; ASPSESSIONIDGGGGQRGV=GAOLJNCCJGCBKNPENHBGALON; ASPSESSIONIDGGGQGGDP=HKPPOOGDHBLFPPCILOFEAOHK; ASPSESSIONIDGQGQGNFK=ENFBNHFAEGFHDBNAAAINIKPO; ASPSESSIONIDGQQQGVUY=OJDHAPPBPLCDABCNFPGBJNAL; ASPSESSIONIDGGGQGVUY=HBDOOEECBEBPBADJPMFACJLD; ARPT=IQKKVWSINT3CKMYJ; ASPSESSIONIDQGGGQHOQ=NLAMCPEADENDOOMECNBCAPDO; CFGLOBALS=HITCOUNT%3D1%23LASTVISIT%3D%7Bts+%272002%2D07%2D31+23%3A53%3A13%27%7D%23 TIMECREATED%3D%7Bts+%272002%2D07%2D31+23%3A53%3A13%27%7D%23; CFID=426530; CFTOKEN=39863655; ASPSESSIONIDQGQGQMGG=MIHOHNGDKMOANCPDMCJNKKKE; ASPSESSIONIDQGQGGLCG=HEJLGBDCEGMHCKMEEIECDOBB; ASPSESSIONIDQGQGGWUC=EKDLICBAJCHGOCJIADFCHDLP; RQFW={9762A7AC-D44A-4B43-AA6D-6688B4D7C48B}; ASPSESSIONIDGGQQQMTK=DFNEFLLBHKAPIBACMJOKOCBH; ASPSESSIONIDQGGQGOBG=GFFNPGMCMGBKDOJICDCGMEMF; WEBTRENDS_ID=212.113.82.197-2086124832.29505808; EGSOFT_ID=212.113.82.197-591707536.29505809; SappiUserID=471577; ASPSESSIONIDQQQQQJCO=MBPICOPCHEHEAFOJHMPBLFBE; ASPSESSIONIDGGQQGOOY=OCPLHOPCPHDLHOGHKCAJLECE; ASPSESSIONIDQQQGQGAB=HGMMAHPBONOHNEGIJICNGLCL"

[edited by: jatar_k at 5:07 pm (utc) on Nov. 15, 2002]
[edit reason] fixed side scroll [/edit]

pmkpmk

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 11:51 am on Nov 15, 2002 (gmt 0)

Hi Andreas,

in reply to message number 161 [webmasterworld.com...]

Well, as soon as I have the .htaccess in the subdirectory of the virtual server, Apache won't reload the config or restart - it exists with error.

The requested URI should be on the same virtual server. Actually configwise I've taken the default config of Apache and all my modifications mostly were in the virtual hosts section.

I'm a bit reluctant to post uncensored configfiles and logfile exceprts here on this public space, but up to my best knowledge (which may not be much) I think I made it right.

I guess it's only one little configuration routine which is faulty or missing.

Since I have all root priviliges, I'm not limited to htaccess but can make changes to other parts of the config as well. As I mentioned in another post I'm only trying to block email harvesters.

So what would be your recommendation?

[edited by: jatar_k at 5:10 pm (utc) on Nov. 15, 2002]
[edit reason] fixed link and sidescroll [/edit]

andreasfriedrich

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 1:46 pm on Nov 15, 2002 (gmt 0)

Hi pmkpmk,

I´m in a hurry right now and need to catch a train in half an hour. I´ll get back to you tommorrow unless somebody else already helped you solve your problem.

Andreas

pmkpmk

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 10:22 am on Nov 19, 2002 (gmt 0)

Thanks to Andreas Friedrich, we solved my mysterious problem!

Andreas found out, that I had a directive:

<Files index.html>
Options -FollowSymLinks +Includes
</Files>

in my httpd.conf. Even though according to the documentation the "Options"-line should be ignored, it actually isn't.

After removing the "-FollowSymLinks" from the statement, everything works as supposed.

pmkpmk

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 11:34 am on Nov 19, 2002 (gmt 0)

Aren't you harassed by the "e-mail spyder" (www.emailspyder.com)?

I has the user-agent "Microsoft URL Control" and - well - spiders for email addresses.

RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL\ Control [NC,OR]

pmkpmk

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 1:15 pm on Nov 19, 2002 (gmt 0)

The second line of defense (slightly off topic, but worth mentioning it):

If you're lucky enough to run your own mailserver under your own control, you can add a second line of defense: the use of realtime blacklists (somtimes also called realtime blocklist or RBL's) in your mailserver allows you to block potential spam when the spammer tries to deliver it to you. On EACH incoming email, the mail-server checks at least one of these RBL's. If the senders IP-address tests positive on this list, email delivery is instantly cancelled even BEFORE the mail-data is transferred to your server. There's a multitude of RBL's out there. Our server checks EACH incoming message against 5 different RBL's. Some of our users - including myself - post-check their messages again against other RBL's. I - for example - have all messages coming from Russia/China/Korea/Malaysia etc. tagged with the prefix "**SPAM**". This second (and third) line of defense makes life a lot esier!

pmkpmk

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 11:52 am on Nov 25, 2002 (gmt 0)

As mentioned above, thanks to Andreas Freidrich everything works fine now, and I "catch" typically 1-2 mail harvesters per day that way. I downloaded a few of those bugs myself to get a feeling how they work.

And now the $1.000.000 prize question is: what hinders a programmer of these bugs to "steal" the user-agent string of - say - IE5.0?

Am I right in thinking that a bot camouflaging itself as IE50 would be COMPLETELY invisible to .htaccess rewrite rules?

Andy_White

10+ Year Member



 
Msg#: 687 posted 4:42 pm on Nov 28, 2002 (gmt 0)

Hi,

I've been reading through this thread and having used htaccess to secure areas of other websites I thought I'd test out the concepts on a dormant web site on my server.

But when I add the file which contains :-

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]

I find I can't get into any of the pages.

The site is a virtual site on a server I have root access to and I've checked the httpd.conf to see that rewriteengine is on in each of the virtual sites.

Can anybody suggest what I'm doing wrong?

Andy

Later update

I've checked my server logs and I'm getting the message:-
RewriteEngine not allowed here

So now I'm really confused

Later still update

Solved it, I needed to amend the access.conf to allow overide on fileinfo

okidata

10+ Year Member



 
Msg#: 687 posted 6:06 pm on Nov 29, 2002 (gmt 0)

Hi All,

I'm very impressed with the knowledge shown in this thread! I've read it at least once but I still have a question.

What if you want to ban certain countries using ReWriteCond? How do I do that?

Right now I'm using:

deny from .at
deny from .bg
etc...

The problem with that is that it even denies my error pages so I'd like to switch over to ReWriteCond instead so that I can give them a page with a reason why they can't reach my site.

Another question... does anyone know how I test to see if the country ban is working correctly? wannabrowser.com works great for referrers but has no provisions for testing from offshore or from a specific IP location.

Thanks for the help.

Cheers,
Dennis

upside

10+ Year Member



 
Msg#: 687 posted 10:34 pm on Dec 3, 2002 (gmt 0)

okidata, I've been wondering something similar. My ban list takes the form of:

SetEnvIf Remote_Addr ^12\.40\.85\. getout
SetEnvIfNoCase User-Agent ^Microsoft.URL getout

<Limit GET POST>
order allow,deny
allow from all
deny from env=getout
</Limit>

This is working fine but how can I show a custom error message without implementing this all using mod_rewrite? Also how can I do a redirect if getout is set? Thanks.

SomeCallMeTim

10+ Year Member



 
Msg#: 687 posted 12:56 pm on Dec 4, 2002 (gmt 0)

Is there a way to use upside's method of:

SetEnvIf Remote_Addr ^12\.40\.85\. getout
SetEnvIfNoCase User-Agent ^Microsoft.URL getout

<Limit GET POST>
order allow,deny
allow from all
deny from env=getout
</Limit>

but return something more ambiguous than a 403 so that the person trying to grab the site is confused...say a 304 for page not modified for example?

Is upside's method more expensive than using rewrite?

Thanks

This 243 message thread spans 9 pages: < < 243 ( 1 2 3 4 5 [6] 7 8 9 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved