Forum Moderators: phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list - Part 3

More tips and tricks for banning those pesky "problem bots!"

         

txbakers

7:38 pm on Oct 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Continued from A close to perfect .htaccess ban list - Part 2 [webmasterworld.com]

Whee - what a great discussion.

[edited by: Marcia at 11:23 pm (utc) on Oct. 13, 2003]

[edited by: jdMorgan at 12:24 am (utc) on Nov. 19, 2003]
[edit reason] Corrected URL [/edit]

Wizcrafts

7:37 pm on Nov 20, 2003 (gmt 0)

10+ Year Member



Anytime I see "unknown" "unknown" in the Referer and UA fields in my hit counter stats, I check the actual raw web logs and always find a dash in those fields. That means that the referer and user agent is either blank, blocked, or actual dashes.

Wiz

Synthetic

4:07 am on Nov 21, 2003 (gmt 0)

10+ Year Member



I have a vague idea of of this issue, but I still want to make an attempt at further 'securing' my site, so please let me know if by inserting the following code into a .htaccess file in my root directory everything will work correctly.

--

RewriteEngine On

# Forbid requests for exploits & annoyances
# Bad requests
RewriteCond %{REQUEST_METHOD}!^(GET¦HEAD¦POST) [NC,OR]
# CodeRed
RewriteCond %{REQUEST_URI} ^/default\.(ida¦idq) [NC,OR]
RewriteCond %{REQUEST_URI} ^/.*\.printer$ [NC,OR]
# Email
RewriteCond %{REQUEST_URI} (mail.?form¦form¦form.?mail¦mail¦mailto)\.(cgi¦exe¦pl)$ [NC,OR]
# MSOffice
RewriteCond %{REQUEST_URI} ^/(MSOffice¦_vti) [NC,OR]
# Nimda
RewriteCond %{REQUEST_URI} /(admin¦cmd¦httpodbc¦nsiislog¦root¦shell)\.(dll¦exe) [NC,OR]
# Various
RewriteCond %{REQUEST_URI} ^/(bin/¦cgi/¦cgi\-local/¦sumthin) [NC,OR]
RewriteCond %{THE_REQUEST} ^GET\ http [NC,OR]
RewriteCond %{REQUEST_URI} /sensepost\.exe [NC]
RewriteRule .* - [F]

# Forbid if blank (or "-") Referer *and* UA
RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule .* - [F]

# Banning BOTS bellow
# Address harvesters
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(autoemailspider¦ExtractorPro) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^E?Mail.?(Collect¦Harvest¦Magnet¦Reaper¦Siphon¦Sweeper¦Wolf) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (DTS.?Agent¦Email.?Extrac) [NC,OR]
RewriteCond %{HTTP_REFERER} iaea\.org [NC,OR]
# Download managers
RewriteCond %{HTTP_USER_AGENT} ^(Alligator¦DA.?[0-9]¦DC\-Sakura¦Download.?(Demon¦Express¦Master¦Wonder)¦FileHound) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Flash¦Leech)Get [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Fresh¦Lightning¦Mass¦Real¦Smart¦Speed¦Star).?Download(er)? [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Gamespy¦Go!Zilla¦iGetter¦JetCar¦Net(Ants¦Pumper)¦SiteSnagger¦Teleport.?Pro¦WebReaper) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(My)?GetRight [NC,OR]
# Image-grabbers
RewriteCond %{HTTP_USER_AGENT} ^(AcoiRobot¦FlickBot¦webcollage) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Express¦Mister¦Web).?(Web¦Pix¦Image).?(Pictures¦Collector)? [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image.?(fetch¦Stripper¦Sucker) [NC,OR]
# "Gray-hats"
RewriteCond %{HTTP_USER_AGENT} ^(Atomz¦BlackWidow¦BlogBot¦EasyDL¦Marketwave¦Sqworm¦SurveyBot¦Webclipping\.com) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (girafa\.com¦gossamer\-threads\.com¦grub\-client¦Netcraft¦Nutch) [NC,OR]
# Site-grabbers
RewriteCond %{HTTP_USER_AGENT} ^(eCatch¦(Get¦Super)Bot¦Kapere¦HTTrack¦JOC¦Offline¦UtilMind¦Xaldon) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web.?(Auto¦Cop¦dup¦Fetch¦Filter¦Gather¦Go¦Leach¦Mine¦Mirror¦Pix¦QL¦RACE¦Sauger) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web.?(site.?(eXtractor¦Quester)¦Snake¦ster¦Strip¦Suck¦vac¦walk¦Whacker¦ZIP) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCapture [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo\ Pump [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
# Tools
RewriteCond %{HTTP_USER_AGENT} ^(curl¦Dart.?Communications¦Enfish¦htdig¦Java¦larbin) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (FrontPage¦Indy.?Library¦RPT\-HTTPClient) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(libwww¦lwp¦PHP¦Python¦www\.thatrobotsite\.com¦webbandit¦Wget¦Zeus) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Microsoft¦MFC).(Data¦Internet¦URL¦WebDAV¦Foundation).(Access¦Explorer¦Control¦MiniRedir¦Class) [NC,OR]
# Unknown
RewriteCond %{HTTP_USER_AGENT} ^(Crawl_Application¦Lachesis¦Nutscrape) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^[CDEFPRS](Browse¦Eval¦Surf) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Demo¦Full.?Web¦Lite¦Production¦Franklin¦Missauga¦Missigua).?(Bot¦Locat) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (efp@gmx\.net¦hhjhj@yahoo\.com¦lerly\.net¦mapfeatures\.net¦metacarta\.com) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Industry¦Internet¦IUFW¦Lincoln¦Missouri¦Program).?(Program¦Explore¦Web¦State¦College¦Shareware) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Mac¦Ram¦Educate¦WEP).?(Finder¦Search) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Moz+illa¦MSIE).?[0-9]?.?[0-9]?[0-9]?$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9][0-9]?.\(compatible[\)\ ] [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NaverRobot [NC]
RewriteRule .* - [F]

#!/usr/bin/perl -w

$remreq = $ENV{REQUEST_URI};
$remaddr = $ENV{REMOTE_ADDR};
$usragnt = $ENV{HTTP_USER_AGENT} ¦¦ "The UA is blank";
$referer = $ENV{'HTTP_REFERER'} ¦¦ "there is no referer";
$date = scalar localtime(time);
$remmeth = $ENV{REQUEST_METHOD};
$remhost = $ENV{'HTTP_HOST'};

open(MAIL, "¦/usr/sbin/sendmail -t") ¦¦ die "Content-type: text/text\n\nCan't open /usr/sbin/sendmail!";
print MAIL "To: ****\@yyy\.zzz\n";
print MAIL "From: xxx\@yyy\.zzz\n";
print MAIL "Subject: You caught another one!\n\n";
print MAIL "The following 'intruder' was caught by the \"Bot Trap\" and has been added to the ban env in .htaccess:\n\n";
print MAIL "The ip address: $remaddr was listed on $date \n";
print MAIL "The file requested was: $remreq\n";
print MAIL "The method used was: $remmeth\n";
print MAIL "The intruder's user agent was: $usragnt\n";
print MAIL "The document was referred by: $referer\n";
print MAIL "The Host Server is was $remhost\n";
close(MAIL);
exit;

--

Wizcrafts

5:26 am on Nov 21, 2003 (gmt 0)

10+ Year Member



Synthetic asked:
I have a vague idea of of this issue, but I still want to make an attempt at further 'securing' my site, so please let me know if by inserting the following code into a .htaccess file in my root directory everything will work correctly.

Inserting all of the code you presented in your example won't secure your website, but it will almost certainly disable it. Bad commands or syntax in a root .htaccess file can cause fatal server errors and make your website go dark!

I see that you have included Perl scripting in your presentation. Perl script does not go into an htaccess file! It goes into a .pl or .cgi script, usually placed in a cgi-bin directory. It's correct operation depends on absolutely correct paths to Perl and Sendmail. The commands in the .htaccess file must be tailored to your own server environment, as dictated by your host's server configuration files. These are not universally accepted settings and vary among web hosting companies. It is even possible that you will not be permitted to use any of the commands that are listed in the RewriteRules section, if your host forbids Mod Rewrite overrides. Furthermore, the broken vertical pipes displayed in these posts are incorrect code and will usually cause a server to give fatal server errors, and possibly deny access to everyone.

I also see that you quoted the first line of what is often a two line command to use Rewrite Rules: RewriteEngine On. The other command that is often required is Options +FollowSymLinks. It all depends on how your web host has configured his Apache Directives for his customers and security concerns.

There are rules in the various examples presented over the course of this thread that were specific threats being dealt with on personal levels, many of which do not automatically apply to everybody else. Some User Agents that are blocked in these examples by one person are allowed by others. Others are not serious enough problems to justify blocking access without a thorough investigation of the circumstances of the visit in question (such as the FrontPage Extensions references...they mean nothing if you don't have a FrontPage enabled site).

It is better to read you web logs on a daily basis and see what IP addresses are looking for pages that are unusual, or that trigger red flags in the general security community. If you see what looks like a suspicious User Agent, check these forums by searching for that UA in the site search engine listed at the top of every Forum page on WebmasterWorld. I would also urge you to read the entire thread that started this discussion, at [webmasterworld.com...] .

On the other hand, any User Agent that contains the words Email, Siphon, Extractor, or other names that imply email extraction, are definitely unwanted hostile agents and should be banned. This assumes that you have email addresses listed on your website that you want to protect from harvesters.

I ban only the most obvious hostile User Agents and read my logs every day. If I see a log record that reveals hostile intent I will deny access to that IP address. Since IP addresses can be dynamic, and innocent surfers can obtain the same IP used by a Phisher, I often have to remove IP bans after a period of inactivity from that address. On the other hand, since many harvesters come from certain countries and fall within a block of IPs, I sometimes block an entire country or ISP, if their members regularly harass my server. This is a judgement call on my part. If you do business with people in APNIC or RIPE network countries these country blocks are definitely not for you!

I hope this helps.

Wiz

Synthetic

7:06 am on Nov 21, 2003 (gmt 0)

10+ Year Member



Yes, the information you provided was of great help. Thank you very much for sharing your knowledge, Wizcrafts. I really do appreciate it..

I will make sure to review this topic more thoroughly so that I get a better grasp on how .htaccess files work. Another thing I'll have to look into is what exactly my web host does and does not support.

jackson

4:41 am on Nov 24, 2003 (gmt 0)

10+ Year Member



Wow ... a man goes out after lunch, comes back for breakfast only to find that the whole menu's changed ...

Was looking over htaccess things back in February this year. 9 months and so many pages later (not to mention all the side branches) and we're almost looking at a different animal.

Quick question - where do error pages now fit into the htaccess scheme of things?

I'm putting in the finishing touches to a project - as in building on one of these CMS things. Thus far their htaccess file consists of the following lines:

ErrorDocument 400 /error.php?400
ErrorDocument 401 /error.php?401
ErrorDocument 403 /error.php?403
ErrorDocument 404 /error.php?404
ErrorDocument 500 /error.php?500

From all of the foregoing I should know where to put in most of the code. What I would like to find out, should the above lines appear at the beginning or be used at the end?

Wizcrafts

5:17 am on Nov 24, 2003 (gmt 0)

10+ Year Member



Jackson asked:
"Quick question - where do error pages now fit into the htaccess scheme of things?"

I personally have my error document redirects placed in the top section, before my deny from or Rewrite conditions or rules. I doubt that this matters to the interpreter, but it makes logical eyeball sense to me to see it first.

On the other hand, Rewrite Conditions and their associated RewriteRules should be placed in descending order based on their priority, so that the worst offenders can be blocked, or redirected as fast as possible, without having to parse the entire file to match a User Agent, Referer, or IP address. I accomplish this by placing all of my fixed IP deny from rules before the RewriteCond rules. The next section contains the Rewrite conditions, with the most serious threats dealt with at the top of the list, and the broad IP ranges and search query restrictions at the bottom of that group.

Wiz

jackson

4:01 am on Nov 25, 2003 (gmt 0)

10+ Year Member



wiz,

thanks for the follow up. Was thinking the same thing. Had it that way before but noticing that the "landscape" had changed some what and this wasn't mentioned or made obvious, ended up wondering what people were doing now.

Put the said file into action and it was doing its work right away. Will leave it serving up 403's for awhile to get a feel for what is happening out there before making changes and going the next step - as in putting in traps and the like. Thanks again.

Wizcrafts

4:34 am on Nov 25, 2003 (gmt 0)

10+ Year Member



Jackson and Synthetic;

In case you haven't been welcomed yet, welcome to WebmasterWorld!

I'm happy to hear that our collective advise is helping you fight off the Borg.

Your logs will help you to formulate the placement order of the rules. It is possible to have multiple RewriteRules, each ending in [L]. This means that if the condition matches that the rule is applied and processing halts there. That's why we try to move the worst offenders to the top of the list, or create special case rules for the likes of the FormMail spammers.

Another thing to watch is how many 403s you are serving. If the number becomes very high, and the custom 403 page is 2 or 3 kb, you might want to consider writing a smaller (100 -200 bytes) main 403 file that just says "Access Denied" and provide a link in it to another 403(b) page that offers explanations about your policies and restrictions. I have two 403 pages like that. Sometimes I end up 403-ing visitors who have inherited a dirty IP, and I offer an explanation to them as to why they were denied access.

Wiz

jackson

2:49 am on Nov 26, 2003 (gmt 0)

10+ Year Member



Wiz, I'll take a look at the 403 number in due course. At the moment I'm getting a "feel for things" on this new site. Been an interesting exercise.

Here is what seems to be a new one for the books. Well, a variation on a theme at least. Got this on my log today:

162.33.101.4 - - [25/Nov/2003:15:58:34 -0600] "POST /_vti_bin/_vti_aut/fp30reg.dll HTTP/1.1" 401 - "-" "-"

Needless to say this a frontpage thing. Have never used it and don't intend to. The web host provides fp extensions and I have left them in - to assess their merits and the "unwanted attention" they may receive.

The question that begs here is - if I put in that fp mod rewrite, will this stop this particular type of intrusion?

As an aside, didn't find anything here on fp30reg.dll. However, a google search throws up reems on security exploits relating to the use of this particular file.

Wizcrafts

3:28 am on Nov 26, 2003 (gmt 0)

10+ Year Member



Jackson;
It looks like some script kiddie has just discovered a 2001 FrontPage buffer overrun flaw and is testing to see if he can find a vulnerable version. This is highly unlikely, as Microsoft pushed out patches for it in the early summer of 2001, with a lot of publicity.

The entry you quoted does not contain the needed 259+ byte data string to overflow the buffer. I guess the S.K. is first testing for the presence of FrontPage 2000, then, if it exists, he will test for a return value of the .dll file to see if it is the unpatched version (unlikely), and then send a 259 byte attack to try to bring down the stupidly unpatched server.

If you are worried about this attempted test for an exploit, and another one I just saw, just use this code:


RewriteCond %{REQUEST_URI} (MSOffice¦_vti¦sumthin) [NC,OR]
RewriteRule .* - [F]

Don't forget to replace the broken pipes with solid ones.

Happy hunting
Wiz

jackson

3:40 am on Nov 26, 2003 (gmt 0)

10+ Year Member



Wiz,

Thanks for getting back on this one so quickly.

Strange - have that item in as:

RewriteCond %{REQUEST_URI} ^/(MSOffice¦_vti) [NC,OR]

Do you think removing ^/ would have any effect?

On another matter, what's the verdict on LinkWalker. In the early sections of this thread it was included but seems to have disappeared off the "hotlist". Got hit by that as well.

jdMorgan

3:46 am on Nov 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Regardless of any additional .htaccess code, you probably won't see a change in the server response. That's because according to the posted log entry, skiddie is getting a 401-Authorization Required response. Even with an additonal - [F] rule on that URI, the 401 is going to take precedence over 403-Forbidden.

Jim

Wizcrafts

3:59 am on Nov 26, 2003 (gmt 0)

10+ Year Member



Jackson asked;

Do you think removing ^/ would have any effect?

Yes, I do think it will have an effect. It will allow the condition to actually catch the requested URI you posted. The ^/ means that you are requiring that the / precedes the MSoffice OR _vti, and that it be the first character in the expression. However, the expression in your log is "POST /_vti_bin/_vti_aut/fp30reg.dll HTTP/1.1". The first character is not /, but is POST, then there is a space (a non-printing character), then the forward slash, then the _vti. By leaving off the ^/ you allow the rule to catch these words and characters wherever they appear in the requested URI, not just at the beginning of the line.

As JD says, the point is moot because your server is requiring authorization to POST to that file. Since no credentials were presented, the request failed with a 401 error. There isn't much sense worrying about sending them to a 403 page if they are already denied access by the 401.

Wiz

jackson

4:03 am on Nov 27, 2003 (gmt 0)

10+ Year Member



Jim, Wiz, thanks for the above.

As an extension of much of the above, this has just come up. First there's this item:

[Wed Nov 26 16:16:40 2003] [error] [client 66.31.231.141] Options FollowSymLinks or SymLinksIfOwnerMatch is off which implies that RewriteRule directive is forbidden: /www/mydomain/_vti_bin/..%5c..

then, in the next get "flooded" with this:

h00e07d969866.ne.client2.attbi.com - - [26/Nov/2003:16:16:39 -0600] "GET /scripts/root.exe?/c+dir HTTP/1.0" 403 - "-" "-"
[26/Nov/2003:16:16:40 -0600] "GET /MSADC/root.exe?/c+dir HTTP/1.0" 403 - "-" "-"
[26/Nov/2003:16:16:40 -0600] "GET /c/winnt/system32/cmd.exe?/c+dir HTTP/1.0" 403 - "-" "-"
[26/Nov/2003:16:16:40 -0600] "GET /d/winnt/system32/cmd.exe?/c+dir HTTP/1.0" 403 - "-" "-"
[26/Nov/2003:16:16:40 -0600] "GET /scripts/..%255c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 403 - "-" "-"
[26/Nov/2003:16:16:40 -0600] "GET /_vti_bin/..%255c../..%255c../..%255c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 403 - "-" "-"
[26/Nov/2003:16:16:40 -0600] "GET /_mem_bin/..%255c../..%255c../..%255c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 403 - "-" "-"
[26/Nov/2003:16:16:40 -0600] "GET /msadc/..%255c../..%255c../..%255c/..%c1%1c../..%c1%1c../..%c1%1c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 403 - "-" "-"
[26/Nov/2003:16:16:41 -0600] "GET /scripts/..%c1%1c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 403 - "-" "-"
[26/Nov/2003:16:16:41 -0600] "GET /scripts/..%c0%2f../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 - "-" "-"
[26/Nov/2003:16:16:41 -0600] "GET /scripts/..%c0%af../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 403 - "-" "-"
[26/Nov/2003:16:16:41 -0600] "GET /scripts/..%c1%9c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 403 - "-" "-"
[26/Nov/2003:16:16:41 -0600] "GET /scripts/..%%35%63../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 215 "-" "-"
[26/Nov/2003:16:16:41 -0600] "GET /scripts/..%%35c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 215 "-" "-"
[26/Nov/2003:16:16:41 -0600] "GET /scripts/..%25%35%63../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 403 - "-" "-"
[26/Nov/2003:16:16:41 -0600] "GET /scripts/..%252f../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 403 - "-" "-"

Couple of things to note - in the last quote removed the subsequent UA's which are all the same. The IP address in the first item and the UA in the second are one and the same party. The other is in how this attack was continually probing through all the options for that "cmd.exe" file.

Granted, I haven't set "Options FollowSymLinks" but the rest of the script is there and seems to be doing its work for the best part. Cannot believe how blatant this "attack" has been - a soft left jab followed by a solid right hook. Something I can't get my head around is the timing. We're talking seconds, I know but the time difference between one and the other is more or less the same. Now, I'm beginning to understand that need to drop these guys into an endless loop. Just thought I would share this one ...

jdMorgan

4:27 am on Nov 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




[Wed Nov 26 16:16:40 2003] [error] [client 66.31.231.141] Options FollowSymLinks or SymLinksIfOwnerMatch is off which implies that RewriteRule directive is forbidden: /www/mydomain/_vti_bin/..%5c..

See the third paragraph of message#33 above, where Wizcrafts discusses this.

The accesses you logged are a typical NIMDA attack, and you can see that all requests got either a 400-Bad Request, a 403-Forbidden or a 404-Not Found response. There's really not much else you can do with these, unless you (or your hosting provider) want to 'black-hole' the IP address at the router. Oh, one more: If you are running FreeBSD, it also has a built-in firewall called IPFW that you may be able to use -- ask your hosting provider.

Regarding your 'endless loop' comment: Rather than try to 'get tricky' with these requests and serve up something special, it's really more productive to ignore them and work on something else. Otherwise, you end up putting more of a load on your server, and it doesn't really accomplish anything. The best approach here is to minimize the size of your 400, 403, and 404 custom error pages in order to reduce bandwidth wasted by these requests. One technique that works well is to make an almost completely-blank error page with a <h1> heading naming the error (i.e. 'Access Denied') and a single text link that an unlucky human might click to get more information about the error. Bad 'bots will never click the link, so you get a very small error page, but you have the option of providing a second page with more info to any innocents you might catch accidentally (and it does happen unless your .htaccess coding skills are superhuman). For this reason, my secondary error info pages are professional but somewhat apologetic, just in case. :)

Jim

jackson

6:18 am on Nov 27, 2003 (gmt 0)

10+ Year Member



Jim, appreciate the follow-up. Yes, a matter of adapting.

On the "Options FollowSymLinks" item. Had that in and end up collecting 500's. But this may have been due to some other items in the script which have since been corrected.

Regarding error pages. Your prognosis makes sense and will make moves in this regard.

As an aside - me thinks it may be time for a book on this matter. There's so much info to be had here its becoming quite taxing in just lining up and joining all the points.

Thanks again.

marke

9:48 am on Nov 30, 2003 (gmt 0)

10+ Year Member



Hi all

What a great resource for ideas and information.
I have been plying with .htaccess to try to block email spiders.
The info posted here has been of great help, but it only blocks those that advertise their presence.
I have downloaded a couple of email spiders, (first that came up on a Google search) and both of these still work through my site quite happily. They show up in the logs as Exlorer 6. I guess there is no way around this other than to have no emails on the web site?

Best regards,
Mark Empson
<snip>

[edited by: jdMorgan at 4:56 pm (utc) on Nov. 30, 2003]
[edit reason] No sigs or URLs, please [/edit]

jdMorgan

5:01 pm on Nov 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Mark,

Welcome to WebmasterWorld [webmasterworld.com]!

Take a look at this thread [webmasterworld.com] for an additional technique to stop site exploits. User-agent screening has the advantage of efficiency, in that you catch many intruders with one test. However, the ones you describe must be blocked by IP address, and further, a few must be blocked by forwarded IP address if they come through proxies.

Jim

jackson

4:27 am on Dec 5, 2003 (gmt 0)

10+ Year Member



Just wondering if its a good to continue with this thread or start a new one ...

Picked this up today:

61.173.105.6 - - [04/Dec/2003:09:08:05 -0600] "GET /phpinfo.php HTTP/1.1" 200 37 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:0.9.4) Gecko/20011128 Netscape6/6.2.1"

The first thing that came to mind was, what's this preson doing requesting phpinfo.php? The next item that comes to mind is block access. And then, in doing that, that would mean including myself if ever I want to check my own details. This is for a hosted web site.

The question here is what to do with something like this?

jdMorgan

5:07 am on Dec 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Block it, then use mod_rewrite to give phpinfo an "alias" URL so you can still get to it.

Your post is on-topic as a potential addition to the list of troublesome user-agents. Any further discussion of *how* to deal with this specific problem probably does need its own thread, though.

Jim

jackson

12:39 am on Dec 6, 2003 (gmt 0)

10+ Year Member



Jim,

Thanks for the follow up and the suggestion.

We'll chug on in what seems to be an ever-changing landscape.

Wizcrafts

8:01 pm on Dec 27, 2003 (gmt 0)

10+ Year Member



Happy Holidays everyone!

I have a question concerning a RedirectMatch issue in my .htaccess file.

I have a hidden link to a non-existant file, which we will call example.html, embedded in a section containing all of the site links. Here is what the link resembles:

<a href="example.html" onclick="return false"> </a>
. The onclick false action is a safety net for visual readers so they don't accidently trigger the banning redirect. Because there is no text for the link it is invisible on the displayed web page.

The link is redirected by .htaccess to my banning script, which we will call ban.pl, for this example. The url cleansed code appears below:


RedirectMatch example\.html /cgi-bin/ban.pl

Now, whenever a scooper-bot, or html-only downloader visits and scrapes for links, they follow the link to example.html and get a 302 redirect, according to my web log, but they do not hit the Perl script! When I tested this in Wannabrowser I was sent to the Perl script and banned, as designed. Here is my latest log of this mis-event:

"GET /example.html HTTP/1.0" 302 219 "-" "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT)"

The ban script is definitely larger than 219 bytes! This leach also took many more html only pages before leaving my server. Thus, he was not self banned, and never triggered the script to which he was supposed to be redirected.

I'd appreciate any help in getting this right.
TIA, Wiz

jdMorgan

10:02 pm on Dec 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Wiz,

The problem is that bad-bots don't follow 302 external redirects! 301 and 302 redirects require the browser (or user-agent) to reissue the request using the URL supplied in the 30x server response. Thus, the user-agent has to actively cooperate in order to fetch the destination file specified by the 30x response.

Whatchawanna do is to force-feed it a completely-server-internal file substitution, not a redirect:


RewriteRule ^example\.html$ /cgi-bin/ban.pl [L]

This instructs the server to immediately substitute the ban.pl file whenever example.html is requested.

If you don't have mod_rewrite capability, about the best you can do is to set up a unix symlink called example.html and point it to ban.pl.

Jim

Wizcrafts

10:09 pm on Dec 27, 2003 (gmt 0)

10+ Year Member



Thanks for the explanation Jim, and Happy Holidays. I will do the Rewrite Rule.

Wiz

decdim

8:29 pm on Jan 15, 2004 (gmt 0)



Some new "visitors" from my "Last 300 Visitors Page"

Some of these may be repeats...please excuse...

>>Triple edit<<
---

### What's up with the referer?! :)
Referer: file:///C:/leads17.html
Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT; SEARCHALOT.COM IE5)

Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt; DTS Agent)

### MSIE 7.01?
Agent: Mozilla/4.0 (compatible; MSIE 7.01; Windows 98)

### To look like "apple"?!
Agent: appie 1.1 (www.walhello.com)

Agent: Microsoft URL Control - 6.00.8862

Agent: Program Shareware 1.0.3

Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request

BohrMe

8:05 pm on Jan 20, 2004 (gmt 0)

10+ Year Member



Along these same lines I have copied the trap.cgi bad bot trap found on this board to formmail.pl/cgi in an effort to auto-ban those trying to exploit formmail, which I do not use.

The problem with this is that every person who has tried to access formmail directly has their HTTP_USER_AGENT set to "-" BUT I have the following in my .htaccess:


RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule ^.*$ noID.php [L]

This causes noID.php to be executed instead of my trap.

Does anyone know of a condtional statement that I can use in my .htaccess to test the URL first and if it matches a case insensitive formmail, do not test for a "-" HTTP_USER_AGENT?

Would it be as simple as placing another RewiteCond condition before this one to check the URL?

jdMorgan

9:51 pm on Jan 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



BohrMe,

> Would it be as simple as placing another RewriteCond condition before this one to check the URL?

Yes.


RewriteCond %{REQUEST_URI} !form.?mail [NC]

inserted ahead of your existing RewriteCond would stop the usual formmail requests from being redirected by that Rule.

Jim

BohrMe

10:05 pm on Jan 20, 2004 (gmt 0)

10+ Year Member



That did the trick! Thank you much!

johnlim

7:32 am on Feb 6, 2004 (gmt 0)

10+ Year Member



THis is a wonderful thread for perfect .htacess, could somebody make a conclusion,

1) What is the best .htaccess, what should put inside?

2) What part should be put into httpd.conf; what else should remain at .htaccess?

Thanks.

jdMorgan

4:15 pm on Feb 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



johnlim,

> 1) What is the best .htaccess, what should put inside?

That depends entirely on your site -- See message number 3 [webmasterworld.com] of this thread.

2) What part should be put into httpd.conf; what else should remain at .htaccess?

mod_rewrite code placed into httpd.conf is compiled at server startup. Code placed into .htaccess is interpreted on each HTTP request. Therefore, code execution in httpd.conf is much more efficient.

However, placing code in .htaccess is better for development because the server does not need to be restarted in order to activate the new code. So, "develop in .htaccess and deploy in httpd.conf" may be a good plan for many Webmasters. However, note that there can be subtle syntax differences between the two contexts. For example, in the absence of a RewriteBase directive, patterns in .htaccess RewriteRules should not contain a leading slash, because the leading slash won't be "visible" to RewriteRule.

As claus points out in msg#3, it is not a good idea to simply copy code from threads like this one without fully understanding the code and all its implications for *your* site on *your* server.

Jim

This 80 message thread spans 3 pages: 80