Forum Moderators: phranque

Message Too Old, No Replies

Condensed htaccess not working

My version of near perfect htaccess doesn't seem to work...

         

easygoin

5:01 pm on Jan 15, 2007 (gmt 0)

10+ Year Member



Hi all,

I have resisted posting and tried to find the error problem myself and spent hours going over it and posts/thread etc without success when I test it. I am now wondering if those user agent rules ever worked or if I just got a bit lucky all these months now. Anyway, could you kindly have a look at the following and flag any problems /errors you can see, esp. in terms of the sequence of rules? thank you - great forum btw and I hope I haven't done anything I shouldn't have re the forum rules. ( NOTE - the IP's at the top are appended from a "trap.pl" script if they dont follow the old robots.txt file - thanks again guys for that one)

SetEnvIf Remote_Addr ^129\.187\.148\.244$ ban
SetEnvIf Remote_Addr ^210\.188\.207\.177$ ban
SetEnvIf Remote_Addr ^85\.255\.118\.118$ ban
SetEnvIf Remote_Addr ^195\.248\.98\.191$ ban
SetEnvIf Remote_Addr ^72\.232\.34\.18$ ban
SetEnvIf Remote_Addr ^61\.132\.139\.250$ ban
SetEnvIf Remote_Addr ^83\.216\.204\.134$ ban
SetEnvIf Remote_Addr ^64\.92\.199\.47$ ban
SetEnvIf Remote_Addr ^207\.115\.69\.194$ ban
SetEnvIf Remote_Addr ^210\.87\.251\.107$ ban
SetEnvIf Remote_Addr ^67\.186\.134\.202$ ban

SetEnvIf User-Agent (.){150} ban
SetEnvIf User-Agent ^$ ban
SetEnvIf User-Agent ^([A-Z]+)$ ban
SetEnvIfNoCase Request_URI (.){150} ban
SetEnvIfNoCase Request_URI \.ht(access¦passwd)$ ban
SetEnvIfNoCase Request_URI ^/[a-z]/winnt ban
SetEnvIfNoCase Request_URI ^/_mem_bin ban
SetEnvIfNoCase Request_URI ^/_vti_bin ban
SetEnvIfNoCase Request_URI ^/default\.ida ban
SetEnvIfNoCase Request_URI ^/exchange ban
SetEnvIfNoCase Request_URI ^/msadc ban
SetEnvIfNoCase Request_URI ^/msoffice ban
SetEnvIfNoCase Request_URI ^/null\. ban
SetEnvIfNoCase Request_URI ^/script ban
SetEnvIfNoCase Request_URI formmail ban

<Files ~ "^.*$">
order allow,deny
allow from all
deny from env=ban
# Ban country TLD's
deny from .al¦.ao¦.ba¦.bg¦.cd¦.cf¦.cg¦.cn¦.cz¦.dz¦.et¦.gh¦.gm¦.hr¦.id¦.in¦.jp¦.ke¦.kh¦.kp¦.kr¦.la¦.lk¦ -
.lt¦.lv¦.ly¦.md¦.me¦.mk¦.mw¦.mz¦.na¦.ne¦.ng¦.np¦.nz¦.ph¦.pk¦.pl¦.rs¦.ro¦.ru¦.rw¦.sd¦.si¦.sk¦.so¦.sz¦ -
.tg¦.th¦.to¦.tr¦.tz¦.ua¦.ug¦.yu¦.za¦.zm¦.zw
# Ban IPs
deny from 61.4.64.0/20
deny from 63.148.99.224/27
deny from 65.118.41.192/27
deny from 210.192.96.0/17
deny from 217.78.
deny from 211.161.24.128/26
deny from 218.15.
deny from 218.64.
deny from 218.65.0.0/17
deny from 219.147.128.0/17
deny from 219.147.174.0/24
deny from netvigator.com
deny from mail.whitepine-ventures.com
deny from boxpaper.com
# Blocks Some Performance Systems International Inc - Washington USA
deny from 38.90.
# Blocks Googles Web Accelerator GWA
deny from 72.14.192.
# Blocks mediaWays Hostmaster on Telefonica UK
deny from 62.53.
deny from 192.55.214.54
deny from 38.98.120.70
</Files>

<Files ~ "^robots\.txt$¦^favicon\.ico$">
order allow,deny
allow from all
</Files>

<Files google_sitemap.xml>
ForceType application/x-httpd-php
</Files>

RewriteEngine on
Options +SymlinksIfOwnerMatch
RewriteRule ^(.*).php/(.*) /$1.php?$2
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^(1Click¦1KeyTools¦Aaron¦Aha¦AmiPic¦Anarchie¦Advanced\ Pic\ Hunter¦attach¦BackWeb¦Baiduspider¦Bandit¦BatchFTP¦BlackWidow¦Bookmark\ Explorer¦Bot\ mailto:craftbot@yahoo.com¦Buddy¦cfetch¦ChinaClaw¦Custo¦Collector¦Copier¦Crescent¦CherryPicker¦DA¦Dart¦ -
DISCo\ Pump¦DittoSpyder¦DIIbot¦dloader(NaverRobot)¦Download¦Downloader¦Drip¦E-conomiser¦eCatch¦ -
EirGrabber¦e?mail.?¦Exabot¦Express\ WebPictures¦ExtractorPro¦Exposed¦EyeNetIE¦FAST\-WebCrawler¦ -
(Fetch\ API\ Request)¦FileHound¦FlashGet¦FreshDownload¦FrontPage¦FunWebProducts¦GetRight¦GetWeb!¦ -
GetSmart¦Gigabot¦Go!Zilla¦Go-Ahead-Got-It¦GornKer¦Grab-a-Site¦gotit¦Grabber¦GrabNet¦Grafula¦gsa-crawler¦ -
hloader¦HMView¦HTTrack¦Ia_archiver¦IconSurf¦iGette¦InfoPath¦Image\ Stripper¦Image\ Sucker¦Indy\ Library¦ -
InterGET¦Internet\ Ninja¦InternetSeer¦Iria¦Irvine¦iSoloWeb¦Java¦JetCar¦JoBo¦JOC\ Web\ Spider¦JustView¦ -
Kazaa¦KybIE¦lachesis¦larbin¦Leech¦libwww-perl¦LinkWalker¦LIKSE\ HTML\ Viewer¦LightningDownload¦ -
Link\ Valet¦link¦lwp-trivial¦lftp¦LmCrawler¦likse¦Magnet¦Mag-Net¦Mass\ Downloader¦Memo¦ -
MetaProducts\ Download\ Express¦MIDown\ tool¦Missigua¦Mister\ PiX¦miixpc¦MM3-WebAssistant¦MJ12bot¦ -
Mozilla.*NEWT¦Mozilla.*Indy¦Mozilla.*DnloadMage¦Mozilla.*WebCapture¦Mozilla.*DreamPassport¦ -
Mozilla.*DnloadMage¦Mozilla.*AspTear¦MSFrontPage¦(Microsoft\ Scheduled\ Cache\ Content\ Download\ Service) -
¦Microsoft.URL¦MIDown\ tool¦Mirror¦Mister\ PiX¦MSIECrawler¦Navroad¦NearSite¦NetAnts¦NetSpider¦Net\ Vampire¦ -
NetZip¦NICErsPRO¦Nitro\ Downloader¦NPBot¦Ninja¦Octopus¦Oe\ Pro¦Offline\ Explorer¦Offline\ Navigator¦ -
omniexplorer_bot¦PageGrabber¦Papa\ Foto¦pavuk¦pcBrowser¦Pheromone¦Pic¦Power\ Siphon¦Pockey¦psbot¦Pump¦ -
prowebwalker¦QRVA¦Real¦Rip¦Robofox¦RealDownload¦Reaper¦Recorder¦ReGet¦Save¦Scooter¦Shareaza¦SearchExpress¦ -
Seekbot¦sherlock¦ShopWiki¦Siphon¦sitecheck.internetseer.com¦SiteSnagger¦SmartDownload¦Snake¦SpaceBison¦ -
Spider¦SQ\ Webscanner¦Stamina¦Star\ Downloader¦steeler¦Stripper¦Sucker¦SuperBot¦SuperHTTP¦Surfbot¦szukacz¦ -
tAkeOut¦Teleport¦TMCrawler¦Twiceler¦TFTP\ Server¦titan¦turingos¦TurnitinBot¦TV33_Mercator¦Ultra-Downloader¦ -
UtilMind\ HTTPGet¦Vacuum¦VoidEYE¦web¦Website\ eXtractor¦Website\ Quester¦Wget¦Whacker¦Widow¦WWWOFFLE¦ -
Xaldon\ WebSpider¦xenu\ link\ sleuth¦Zeus¦ZyBorg) [NC]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?.*stuff.*\.com [NC]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?iaea\.org [NC]
RewriteRule .* - [F]

# Safer Anti HOTLINKING code
RewriteCond %{REQUEST_FILENAME} .*jpg$¦.*gif$¦.*png$¦.*bmp$ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !google\. [NC]
RewriteCond %{HTTP_REFERER} !googlebot\. [NC]
RewriteCond %{HTTP_REFERER} !froogle\.google\. [NC]
RewriteCond %{HTTP_REFERER} !froogle\. [NC]
RewriteCond %{HTTP_REFERER} !yahoo\. [NC]
RewriteCond %{HTTP_REFERER} !aol\. [NC]
RewriteCond %{HTTP_REFERER} !ask\. [NC]
RewriteCond %{HTTP_REFERER} !uk\.ask\. [NC]
RewriteCond %{HTTP_REFERER} !msn\. [NC]
RewriteCond %{HTTP_REFERER} !pangora\. [NC]
RewriteCond %{HTTP_REFERER} !search\?q=cache [NC]
RewriteRule (.*) /htaccess/showpic.php?pic=$1 [L]

redirect 301 /links [somedomain.com...]

# <IfModule mod_php4.c>
# php_value auto_prepend_file "/somepath/runawaycrawlers.php"
# </IfModule>

[edited by: jdMorgan at 5:23 pm (utc) on Jan. 15, 2007]
[edit reason] Fixed line-wrapping [/edit]

jdMorgan

5:31 pm on Jan 15, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This syntax is invalid: deny from .al¦.ao¦.ba¦.bg¦.cd¦.cf¦.cg¦.cn¦.cz¦.dz¦.et¦.gh...

Your user-agent/referer-blocking RewriteRule needs an [OR] on all but the final RewriteCond.

Hotlinking code: Delete first RewriteCond, move pattern to rule, tidy-up. Showing first and last lines only:


RewriteCond %{HTTP_REFERER} !^$
...
RewriteRule ^([^.]+\.(jpg¦gif¦png¦bmp))$ /htaccess/showpic.php?pic=$1 [NC,L]

I strongly suggest that when making changes like this, you break them down into smaller easy steps, changing one thing at a time. That way, when something goes wrong, you'll at least have some idea of where to look.

Also, consult the documentation of each directive you intend to use. You cannot use "made-up" syntax on "Deny from" and expect it to work. The syntax of different Apache modules varies due to age, the skills and preferences of the author, and other factors. The notation that works with one won't necessarily work with another.

Finally, how many of those "bad user agents" actually visit your site? Some haven't been seen in years. Remember when adding code like that, or when doing reverse-DNS lookups on country-code TLDs, that this file will be processed for each and every HTTP request to your server.

In the case of the user-agents, that means the requests will have to be checked for each and every user-agent you list. And in the case of checking country codes, your server will have to send a Reverse-DNS lookup request to the DNS system --and wait for a reply-- for each and every page, image, script, etc. requested from your server. If your site gets popular, this will bring it to a crawl. Consider making those RDNS lookups conditional, for example, enclosing them in a <Files> container so that only requests for "pages" are RDNS-checked, and not every image request, etc.

Jim

easygoin

5:46 pm on Jan 15, 2007 (gmt 0)

10+ Year Member



Jim, thank you and I have taken note of the advice, but I did try to understand/see what was required and tried a few version of the domain bans (I enclosed the pipe delimited domains in brackets if this is the correct way?), but I would appreciate your kind help in determining the correct format the domains should be listed in. is it with brackets?

deny from (.al¦.ao¦.ba¦.bg¦.cd¦.cf¦.cg)

Also added the [NC,OR] to the user agents and the first referer (*stuff.* one) and lef the last "iaea" referer with just [NC]

As regards the RDNS I understand and will trim the domains and bad agents list down as much as possible. if there is a current user agents list that has the "main badies" could you kindly point me to it, and i will in comparison with my main site logs, limit the agents listed for load reasons.

Sorry Jim, being thrown out of the office, back tomorrow to see the thread.... thank you btw and I have read almost all the near perfect htaccess threads 1,2,3 and the last one feeeew, it's extensive :)

Kind thanks for all your help and advice, it is invaluable for n00bs like myself.

jdMorgan

6:26 pm on Jan 15, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Deny from [httpd.apache.org]

The best source for your User-agent deny list is your own server access logs -- Some of those user-agents are no longer active, and there's no use wasting time looking for a 'baddie' that never bothers your sites.

Most scrapers have wised-up and are now using a valid Mozilla compatible browser User-agent.

Jim

easygoin

7:02 pm on Jan 15, 2007 (gmt 0)

10+ Year Member



Thanks Jim - had a good trawl through the deny link - and can't see any possibility of "gouping" so it's back to listing one per line as far as I can tell.

I have amended the htaccess according to your advice, and placed below, can you have a look and kindly advise if the syntax and function is good and if there are any further "errors" I have made.

Lastly I sort of understand the idea of enclosing in the <files> but not sure in this file how I would do this (can you guess what i am going to ask next :) ) ... any advice very appreciated... yes essential I am begging for the best version of the htaccess file below, like evrything the most efficient (size and function) way of achieving the below, and of course if some of these just aren't relevant any more, then please advise. I have considerately taken the most common and pertinent settings from the old near perfect thread and from my own logs (now trimmed right down to those from memory I think are relevant - got to check my logs again over the next few days and quickly get the "badies" again), so it's interesting that many of those agents are just not seen any more as you say, but that's the nature of the web ho-hum.

SetEnvIf Remote_Addr ^129\.187\.148\.244$ ban
SetEnvIf Remote_Addr ^210\.188\.207\.177$ ban
SetEnvIf Remote_Addr ^85\.255\.118\.118$ ban
SetEnvIf Remote_Addr ^195\.248\.98\.191$ ban
SetEnvIf Remote_Addr ^72\.232\.34\.18$ ban
SetEnvIf Remote_Addr ^61\.132\.139\.250$ ban
SetEnvIf Remote_Addr ^83\.216\.204\.134$ ban
SetEnvIf Remote_Addr ^64\.92\.199\.47$ ban
SetEnvIf Remote_Addr ^207\.115\.69\.194$ ban
SetEnvIf Remote_Addr ^210\.87\.251\.107$ ban
SetEnvIf Remote_Addr ^67\.186\.134\.202$ ban

SetEnvIf User-Agent (.){150} ban
SetEnvIf User-Agent ^$ ban
SetEnvIf User-Agent ^([A-Z]+)$ ban
SetEnvIfNoCase Request_URI (.){150} ban
SetEnvIfNoCase Request_URI \.ht(access¦passwd)$ ban
SetEnvIfNoCase Request_URI ^/[a-z]/winnt ban
SetEnvIfNoCase Request_URI ^/_mem_bin ban
SetEnvIfNoCase Request_URI ^/_vti_bin ban
SetEnvIfNoCase Request_URI ^/default\.ida ban
SetEnvIfNoCase Request_URI ^/exchange ban
SetEnvIfNoCase Request_URI ^/msadc ban
SetEnvIfNoCase Request_URI ^/msoffice ban
SetEnvIfNoCase Request_URI ^/null\. ban
SetEnvIfNoCase Request_URI ^/script ban
SetEnvIfNoCase Request_URI formmail ban

<Files ~ "^.*$">
order allow,deny
allow from all
deny from env=ban
# Ban country TLD's
deny from .cn
deny from .cz
deny from .ng
deny from .pl
deny from .rs
deny from .ro
deny from .ru
deny from .ua
# Ban IPs
deny from 61.4.64.0/20
deny from 63.148.99.224/27
deny from 65.118.41.192/27
deny from 210.192.96.0/17
deny from 217.78.
deny from 211.161.24.128/26
deny from 218.15.
deny from 218.64.
deny from 218.65.0.0/17
deny from 219.147.128.0/17
deny from 219.147.174.0/24
# Blocks Googles Web Accelerator GWA
deny from 72.14.192.
# Blocks mediaWays Hostmaster on Telefonica UK
deny from 62.53.
deny from 192.55.214.54
deny from 38.98.120.70
</Files>

<Files ~ "^robots\.txt$¦^favicon\.ico$">
order allow,deny
allow from all
</Files>

<Files google_sitemap.xml>
ForceType application/x-httpd-php
</Files>

RewriteEngine on
Options +SymlinksIfOwnerMatch
### SEF rewrite rule - got to laugh ###
RewriteRule ^(.*).php/(.*) /$1.php?$2
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^(Baiduspider¦Bandit¦BatchFTP¦BlackWidow¦Bookmark\ Explorer¦Bot\ mailto:craftbot@yahoo.com¦CherryPicker¦Download¦Downloader¦e?mail.?¦Exabot¦Express\ WebPictures¦Extractor¦Exposed¦FrontPage¦FunWebProducts¦GetRight¦GetWeb!¦GetSmart¦Gigabot¦Go!Zilla¦Grabber¦HTTrack¦Ia_archiver¦Image\ Stripper¦Image\ Sucker¦InternetSeer¦Leech¦libwww-perl¦Magnet¦MetaProducts\ Download\ Express¦MJ12bot¦Mozilla.*NEWT¦Mozilla.*Indy¦Mozilla.*DnloadMage¦Mozilla.*WebCapture¦Mozilla.*DreamPassport¦Mozilla.*DnloadMage¦Mozilla.*AspTear¦MSFrontPage¦(Microsoft\ Scheduled\ Cache\ Content\ Download\ Service)¦Microsoft.URL¦MIDown\ tool¦MSIECrawler¦NetSpider¦Net\ Vampire¦Oe\ Pro¦Offline\ Explorer¦Offline\ Navigator¦Seekbot¦ShopWiki¦Siphon¦sitecheck.internetseer.com¦SQ\ Webscanner¦Sucker¦Surfbot¦Website\ eXtractor¦Website\ Quester¦Wget¦Zeus¦ZyBorg) [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?.*stuff.*\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?iaea\.org [NC]
RewriteRule .* - [F]

# Safer Anti HOTLINKING code
RewriteCond %{HTTP_REFERER}!^$
RewriteCond %{HTTP_REFERER}!somedomain\.co\.uk [NC]
RewriteCond %{HTTP_REFERER}!someotherdomain\. [NC]
RewriteCond %{HTTP_REFERER}!search\?q=cache [NC]
RewriteRule ^([^.]+\.(jpg¦gif¦png¦bmp))$ /htaccess/showpic.php?pic=$1 [NC,L]

redirect 301 /links [somedomain.com...]

# <IfModule mod_php4.c>
# php_value auto_prepend_file "/somepath/runawaycrawlers.php"
# </IfModule>

jdMorgan

10:31 pm on Jan 15, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To reduce the number of reverse-DNS checks, enclose them in a container that limits their execution to requests for "pages" only and not all of the images, external CSS and JavaScripts that those pages might include. For example, if all of your pages are .html, .htm, .shtml, .shtm, php, and php2 through php5, you might use:

<FilesMatch "\.(s?html?¦php[2-5]?)$">
deny from .cn
deny from .cz
...
deny from .ru
deny from .ua
</FilesMatch>

The idea being, if they can't scrape the page to begin with, then they can't ever discover the URLs pointing to all of those included objects.

As far as the "best" or "most efficient" .htaccess code, or the "best" User-agent deny list, I'm afraid there is simply no one right answer -- By their nature, .htaccess files are closely-tied to the site's URL-structure, contents, and probability of abuse, and these factors can only be accurately evaluated by the site's Webmaster.

Note that posting on this forum modifies the pipe character; Change all broken pipe "¦" characters to solid pipe characters before trying to use any code you find posted here.

Jim