Forum Moderators: phranque

Message Too Old, No Replies

htaccess & Modrewrite working but needing advice

despite it works,seems confused

         

Luminoria

8:01 pm on Aug 11, 2009 (gmt 0)

10+ Year Member



Hello, everybody :)
after lost of bandwidth attacks, loss of money and shut downs i've decided to study a little and found this forum :)
i am currently in a new webhost, and even after lots of research and tutorials around in this and many other forums,i confess i still think all this stuff so hard!
i started adapting my site's needing with some codes available here, always trying to respect the pipes rules and all other advices.
But as i am newbie, everything turns into doubts... i asked opinion with the people at my webhost and they said that the .htacess was ok ...
But i still have some doubts regarding the current .htaccess i'm using, specially in the IP banning part...it seems so confusing!
Maybe i am writing wrong things, and in advance i ask sorry for any inconvenient.
I am really new on this. I have my site for some years, but never heard about mod_rewrite before ( although now i am a bit really bit informed via mod-rewrite forums )
Guys, is it possible for you to review my htaccess and point to me any fix or optimization to get it smaller?
I have been using it, and it seems to work well :)
i see through stats and logs. But i don't know if it is normal to be all this mess in ip banning...
i see some htaccess so organized...i don't know, i'm confused a little.

Thank you all in advance for provide such valuable help, excuse for any inconvenience. Appreciate any help :)
Kind Regards,
Lumi

ps. i know that it´s not ok to post htaccess full, it is big. So i took off some IPs ( but i have lots of)

other thing i'd like to ask help is to have my own custom 403/404 error pages. And also to avoid the confusion with my custom 403 page being requested to banned people.
Obs. as i don´t know how to work with Php, i only have htmls on my site, so my error pages will be always html.

code: ( sorry being big )
RewriteEngine on
# -FrontPage-IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*<Limit GET POST>#The next line modified by DenyIPorder allow,deny#The next line modified by DenyIP#deny from all allow from all</Limit><Limit PUT DELETE>order deny,allow deny from all</Limit>AuthName simcredibledesigns.com AuthUserFile /home/simc/public_html/_vti_pvt/service.pwd
AuthGroupFile
/home/simc/public_html/_vti_pvt/service.grp

<Files 403.shtml>
order allow,deny
allow from
all
</Files>

deny from 82.128.214.104
deny from 210.153.217.137
Options All -Indexes
deny from 81.214.175.0
deny from 32.179.10.128
(more and more banned IPs here...)

Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} !^http://([-a-z0-9]+\.)?simcredibledesigns\.com(/.*)?$ [NC]
RewriteRule ^(.*)\.(gif¦jpe?g¦png¦ico¦rar¦zip)$ /file.jpg?$1.$2 [NC]

# Forbid if blank (or "-") Referer *and* UA
RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule .* - [F]

# Address harvesters
RewriteCond %{HTTP_USER_AGENT} ^(autoemailspider¦ExtractorPro) [NC,OR]
RewriteCond %{HTTP_USER_AGENT}

^E?Mail.?(Collect¦Harvest¦Magnet¦Reaper¦Siphon¦Sweeper¦Wolf) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (DTS.?Agent¦Email.?Extrac) [NC,OR]
RewriteCond %{HTTP_REFERER} iaea\.org [NC,OR]
# Download managers
RewriteCond %{HTTP_USER_AGENT} ^(Alligator¦DA.?[0-9]¦DC\-Sakura¦Download.?(Demon¦Express¦Master¦Wonder)¦FileHound) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Flash¦Leech)Get [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Fresh¦Lightning¦Mass¦Real¦Smart¦Speed¦Star).?Download(er)? [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Gamespy¦Go!Zilla¦iGetter¦JetCar¦Net(Ants¦Pumper)¦SiteSnagger¦Teleport.?Pro¦WebReaper) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(My)?GetRight [NC,OR]
# Image-grabbers
RewriteCond %{HTTP_USER_AGENT} ^(AcoiRobot¦FlickBot¦webcollage) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Express¦Mister¦Web).?(Web¦Pix¦Image).?(Pictures¦Collector)? [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image.?(fetch¦Stripper¦Sucker) [NC,OR]
# "Gray-hats"
RewriteCond %{HTTP_USER_AGENT} ^(Atomz¦BlackWidow¦BlogBot¦EasyDL¦Marketwave¦Sqworm¦SurveyBot¦Webclipping\.com) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (girafa\.com¦gossamer\-threads\.com¦grub\-client¦Netcraft¦Nutch) [NC,OR]
# Site-grabbers
RewriteCond %{HTTP_USER_AGENT} ^(eCatch¦(Get¦Super)Bot¦Kapere¦HTTrack¦JOC¦Offline¦UtilMind¦Xaldon) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web.?(Auto¦Cop¦dup¦Fetch¦Filter¦Gather¦Go¦Leach¦Mine¦Mirror¦Pix¦QL¦RACE¦Sauger) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web.?(site.?(eXtractor¦Quester)¦Snake¦ster¦Strip¦Suck¦vac¦walk¦Whacker¦ZIP) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCapture [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo\ Pump [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Twiceler* [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NewsGatorOnline* [OR]
RewriteCond %{HTTP_USER_AGENT} Ask.Jeeves [OR]
RewriteCond %{HTTP_USER_AGENT} ^FAST-WebCrawl [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia\_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*wget* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Hatena Antenna [OR]
RewriteCond %{HTTP_USER_AGENT} InfoSeek [OR]
RewriteCond %{HTTP_USER_AGENT} ^Scooter [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teoma [OR]
RewriteCond %{HTTP_USER_AGENT} VoilaBot [OR]
RewriteCond %{HTTP_USER_AGENT} Inktomi [NC,OR]
RewriteCond %{HTTP_USER_AGENT} !zyborg [NC,OR]
RewriteCond %{HTTP_USER_AGENT} !webcrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} !^Gigabot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} !scrubby [NC,OR]
RewriteCond %{HTTP_USER_AGENT} !ImageScape [NC,OR]
RewriteCond %{HTTP_USER_AGENT} !^Mozilla\ 3\.01 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} !CydralSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} !^Mozilla\ 3\.01 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*foxtorrent* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MEGAUPLOAD [OR]
RewriteCond %{HTTP_USER_AGENT} Msnbot¦Slurp [NC,OR]
# Tools
RewriteCond %{HTTP_USER_AGENT} ^(curl¦Dart.?Communications¦Enfish¦htdig¦Java¦larbin) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (FrontPage¦Indy.?Library¦RPT\-HTTPClient) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(libwww¦lwp¦PHP¦Python¦www\.thatrobotsite\.com¦webbandit¦Wget¦Zeus) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Microsoft¦MFC).(Data¦Internet¦URL¦WebDAV¦Foundation).(Access¦Explorer¦Control¦MiniRedir¦Class) [NC,OR]
# Unknown
RewriteCond %{HTTP_USER_AGENT} ^(Crawl_Application¦Lachesis¦Nutscrape) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^[CDEFPRS](Browse¦Eval¦Surf) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Demo¦Full.?Web¦Lite¦Production¦Franklin¦Missauga¦Missigua).?(Bot¦Locat) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (efp@gmx\.net¦hhjhj@yahoo\.com¦lerly\.net¦mapfeatures\.net¦metacarta\.com) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Industry¦Internet¦IUFW¦Lincoln¦Missouri¦Program).?(Program¦Explore¦Web¦State¦College¦Shareware) NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Mac¦Ram¦Educate¦WEP).?(Finder¦Search) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Moz+illa¦MSIE).?[0-9]?.?[0-9]?[0-9]?$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9][0-9]?.\(compatible[\)\ ] [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NaverRobot [NC]
rewritecond %{REQUEST_URI} !(/$¦^$)
RewriteRule .* - [F,L]

wilderness

8:28 pm on Aug 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Welcome to Webmaster World.

IMO, and considering your new found understanding of htaccess?
Your simply attempting to accomplish too much too soon.

Copy and pasting multiple excerpts from various examples is not always a good practice. Many of the UA's your using are not even applicable today.

In addition, hopefully you understand the basic difference and/or applied theory of: "begins with"; "contains"; "ends with"?
In your RewriteCond lines.

I would also encourage you to NOT focus on precise Class d IP's:
EX:
#yours merely deny that exact range
deny from 81.214.175.0

#ask yourself if traffic from Turkey is beneficial to your site and whether the entire provider range may be denied. (a bad bot may simply return on 81.214.175.1 or alternatives.
#the providers entire range
deny from 81.214.0.0/16
deny from 81.214.128.0/17

I would encourage you to make small additions to your htaccess (verifying your websites function ability after each change.) And proceed over weeks and months as you grasp this new understanding of not only what others are doing, however what is beneficial or detrimental to your own site.

Luminoria

9:18 pm on Aug 11, 2009 (gmt 0)

10+ Year Member



Thank you for the welcome and also for reply :)
and about the turkey ip, yes...if it is not a bot, it still interests my site. I offer The sims downloads, 90% free other part, paid. I would not want to ban a country, in fact my real worry is to ban abuses like download managers, offline sites, download grabbers, direct links to my pages and content and also to my pictures...
I had much attack on past with people having my stuff witout even load my site, and also they used to copy hundred times the same files, specially the biggest ones.
Attacks use to happen most time via Megaupload, getright, FDM, and other automated programs. they always eat a lot of my images and specially my rar/zip files.

quote:
"In addition, hopefully you understand the basic difference and/or applied theory of: "begins with"; "contains"; "ends with"?
In your RewriteCond lines"

---*shame*...i think i don't understand it very well...
i read the modrewrite forum, all the intro part, but yet i was confused. I use translators most timeand sometimes it gets messy. I have neen trying to read carefully the instructions when copying these codes, because i really don't understand them correctly.
Sometimes i think i may ban innocent users, but i am so tired to pay extras for bandwidth, i have been abused on this for so long... at least seems that those automated downloads stopped. I used to have 13Gb in a day and things are calmer now.
All i dream is to allow people to nagivate and get downloads only from within my site and not from automated abusive programs. If a visitor uses such programs he/she doesn´t interest to my site anymore. That´s not elegant, i confess....but it is truth. I feel myself so tired to pay for being abused...

is there anything i can do to fix my outdated code parts?
thank you for patience :)

jdMorgan

9:35 pm on Aug 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> is there anything i can do to fix my outdated code parts

Take each user-agent string (or sub-string) specified in those RewriteConds, and search your monthly raw server access log file for tht user-agent. If that user-agent is not abusing your site, then remove the RewriteCond from your .htaccess file.

Basically, get rid of any lines that are not benefitting your site.

Jim

wilderness

10:12 pm on Aug 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



and about the turkey ip, yes...if it is not a bot, it still interests my site.

The IP range I provided will not ban the entire country of Turkey from your site.

Instead it was meant to offer you the possibly of expanding the ranges you deny.
In the long run, this kind of action, results in less maintenance and updating. Especially given, that all the harvester needs to do is simply disconnect and reconnect with a new range. While resuming his harvesting/downloading.

Jim's suggestion that you review your past months visitors User Agents with this out-dated list (modifying in the process), is something I suggest you begin your process of learning with.

Review the effect and make adjustments as you go along.

Currently your just overwhelmed with attempting too accomplish too much and too soon.

Luminoria

11:59 pm on Aug 11, 2009 (gmt 0)

10+ Year Member



Oh thank you both for your directions. I took off the lines.
Now will be reviewing things. The only thing i had no courage to take off was the line about getright and similars, i don't know why, it gives me a little bit of safe feeling. That's strange, but...
well, i will be collecting my own info, will be reviewing most past abuses and studying more. at least i pray things go easy from now on, and yes, you all are so right, for the moment it is so much info for my little understanding.
and thanks also for the lights about the ip banning subject! it helps a lot^^
Thank you for all advice...even in my beginning with all codes and such, i have been able to get many valuable info from this forum :)
Cheers,
Lumi

Luminoria

12:04 am on Aug 12, 2009 (gmt 0)

10+ Year Member



oh sorry for make the 2nd post...is it posible to ask to delete some lines in the beggining of the code on my first message, please? ( or the whole code if suitbale...)
i tried but it's time out.
If is not possible to del, no problem. Just wanted to take off those ips and also some folders i have.
Thank you and be fine :)

jdMorgan

12:28 am on Aug 12, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think what both wilderness and I are telling you is the same: Do do not add code or delete code if you do not know and understand the reason for the change. Learn first, then modify the code. It is impossible to "guess" the correct solution.

For the list of "bad user agents" like getright and the others, delete the ones that never visit your site, and keep the ones that you always see trying to waste your bandwidth. Some of those user agents are very old and are never used any more. It is not that the bad people are gone now, though. It is just that they have improved their scripts to use "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1..." In other words the old bad user agents now identify themselves as normal browsers to bypass old scripts like the ones you copied. :(

When dealing with user-agent strings, you have several options. You can specify complete (exact) user agent strings or partial user-agent strings. When specifying partial user-agent strings, you can specify that you want to match only the beginning, only the end, or neither.

So:
. ^bad-bot$ matches only a user agent string that is exactly "bad-bot"
. ^bad-bot matches any user-agent string that starts with "bad-bot"
. bad-bot$ matches any user-agent string that ends with "bad-bot"
and
. bad-bot matches any user-agent string that contains "bad-bot"
In this way, you can make your patterns more precise (when needed) to avoid blocking 'good' user-agents, and you can make them less precise in order to block more than one bad-bot with the same line of code.
For example,

 RewriteCond %{HTTP_USER_AGENT} download [NC,OR]
RewriteCond %{HTTP_USER_AGENT} e-?mail [NC,OR]

can be used to block any user-agent string that contains either "download" or "e-mail" or "Email". Note that the [NC] flag makes the pattern-matching case-insensitive.

For more information on regular expressions pattern matching, see the tutorial cited in our Apache Forum Charter.

Jim

wilderness

12:51 am on Aug 12, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



And just to "yank Jim's chain" (please disregard Luminoria):

There was (and possibly still is) a method of using "exactly as" with either the "begins", "ends" or "contains" options.

I used the procedure successfully for years, however with the various Apache updates, it works on some servers while NOT working on other servers.

By enclosing the phrase in quotes, it was unnecessary to escape either spaces or characters.
EX:
"the red house 1.0"
(which we would normally express as:
the\ red\ house\ 1\.0

What makes this controversial (beside working or not depending on the server) is that Apache Tutorials suggest that all Rewrite lines should/require be enclosed in "quotes", and nothing could be further from the truth.