Forum Moderators: phranque
For the last weeks I had a huge traffic comming from a huge range of ips which are pornosites (i think).
I reached my bandwitch limit very fast, and all that my hosting provider said was that I have to upgrade to a dedicated server cause they couldnt do anything to stop these attacks. Since I m afraid that even if I upgrade, soon I ll face the same problem, I started searching internet about my site protection.
So, I found out some things and I finally uploaded an htaccess file, which fortunately reduced traffic to half, but unfortunatelly the problem is still there and probably I need more to do.
So I post here my htaccess file and I m asking you if you think that are mistakes there, things to change or add so as to make it more powerful. Also, except this file, is there anything, anything that I should do to solve the problem?
Please, answer in a way that a newbie and non english speaking person could undersntad.
Thank you in advance for any help... and here is my htaccess file:
edit: the forum system said that the post is too long so I ll post it in 2-3 posts.
#These lines block agents commonly used to harvest URLs and email addresses.
#One of the uses of such agents is to gather URLs for subseqent referral spamming
#by a large number of hosts. Thus, preventing their access may, by itself, decrease
#the amount of referral spam you receive.
RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL\ Control.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ .*Win\ 9x\ 4\.90.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Indy\ Library.*$ [NC,OR]
#These lines block bots that use your bandwidth for their own commercial reasons.
RewriteCond %{HTTP_USER_AGENT} ^abot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^aipbot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Linkwalker$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*nameprotect.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*TurnitinBot.*$ [NC,OR]
#These rewrite conditions might be more conservative than some people want to be.
#They deny referrers with a domain name structure with hyphens, such as
#word-word-whatever.com or word-word.word.com. If you think you might get legitimate
#hits from domains with that name structure, just delete these conditions.
RewriteCond %{HTTP_REFERER} ^(http://)?(www\.)?[a-z]+\-[a-z]+\-.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^(http://)?(www\.)?[a-z]+\-[a-z]\.[a-z].*$ [NC,OR]
Welcome to WebmasterWorld!
Few people here have the time to review large code dumps like this. If you would like to ask a few specific questions, then you'd be more likely to get an answer.
For now, all I can suggest is that the RewriteConds like
RewriteCond %{HTTP_REFERER} ^(http://)?(www\.)?.*(-¦.)texas(-¦.).*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?.*[-.]texas[-.] [NC,OR]
Unfortunately, I have no spesific questions, cause i do not understand anything that is written in this file. I just found it on the net, saw that it contains words that have to do with the spesific attack I have in my site and apllied it immediately.
Oh, may be one question: what would be much effective to write in "Rewrite rule"?
Perhaps some time spent studying the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com] would be helpful (and wise).
Jim
At the beginning, I faced some error but know it seems to work fine. Also, one reason that I asked for help in this forum, was that i m not familiar with all these things.
Thanks for your advice.
As Jim mentioned, that "is" quite a LONG list for the .htaccess file
For my own site (and a few other domains/sites/accounts that I manage for other people, I also use the .htaccess file to control bad bots. I use my own .htaccess file for each of the other accounts too, being that I also manage those accounts, including once or twice a month Visually Scanning the raw log files for bad stuff (unknown bots, possible harvesters, sites that I don't want linking to mine or the other sites) etc.
The .htaccess file that I currently use is about 900 lines and at about 45k size.
If you would like some help "cleaning up" your own htaccess file, I could possibly assist, in my spare time though in the next short while. One example that I did see was that a lot of the referrers were written in quite a lengthy form, one example of clean up (and less entries) might be instead of having a lines such as
RewriteCond %{HTTP_REFERER} ^(http://)?(www\.)?transexual.*$ [NC,OR]
would be to have one alike
RewriteCond %{HTTP_REFERER} (sex¦drugs¦pills¦meds¦love) [NC,OR]
.... which, if finding any of those words within the referrer it would be a
match, thus sex or drugs or pills.... etc could be anywhere in the line
the vertical bar "¦" means or and the ^ means starting with and the $
is just a terminator for the end of a string.
If you would like some help in cleaning up your .htaccess file where I would
re-create one that would match most everything that you currently have in
yours but it would be more compact and possibly more efficient, then check your Sticky Mail (go to the top of this page, click upon Control Panel, then click upon Sticky Mail). I will send you my yahoo email address. You could then send me your .htaccess file renamed as htaccess.txt as an email attachment. I will trust that you do not sell email address or send spam to them (part of why I have a yahoo email account). Additionally I do not send spam, nor sell any email lists etc.
Where you do have use of your own .htaccess file, I presume that you do also have access to your own access log file. (Once or twice a month I check my own log file and that of two other accounts that I manage and have gotten to be pretty good at visually spotting "unusual" entries in the log files, just by using Wordpad, simple editor.)
Wish you luck, and check your Sticky Mail through the Control Panel of this forum if you'd like some more help and possibly what explanations that I might be able to give to you that might be helpful... I "try" to write explanations etc in "ordinary" words that most anyone might be able to understand.
Regards,
George
I mean wipe your public html folder of all but an index page with all links and nav dead ..
dont worry it's only temporary ..:)
( otherwise if I read you right your credit card is gonna die )
therefore other "rogue" requests will "404"
..but at least you can breath ..you could "zero" your robots text like Brett just did but it is not advised unless you have the kind of permanent incomings that this place does ..
then read what has been suggested above ..
and as it may be urgent ..
what is your first/ best language ..?
maybe some member can explain understand / advise /translate / better for you in whatever language that is..via sticky mail ..
if it is french ..sticky me
if other we can find someone who can interface ..( I hope ) with you to help you out ..
The reason that these people send such vast traffic to you is to get their website address published on your site. That then gets picked up by Google, etc and gives them Inbound-Links (IBLs), which improves their Google ratings. That can only happen if you have a page on your site which allows anyone to view your website stats.
If so, remove it. That will not be an overnight fix but, eventually, they will realise that hits on your site do them no good.
(Other areas are forum, blog and wiki spamming).
Just to answer your questions:
wsmeyer, the attackers try to spider my site.
Leosghost, I m from Greece (and french language looks like... chinese to me)
AlexK, what you said is exactly what happened to my site. The problem is that I found out that, after the attackers "came". I ve disabled the spesific stats page, but they still come.
Stat page spam will usually drop off ( they tend to come back frequently to see if the page is still accessable to others and to see if g is still indexing when the page is no longer available they give up and go elsewhere ..it's a sort of bootstrap spam hack for low performing sites to do this to you ..it cant last ..there are bots available who's mission is to do this search for stat pages to hit ..and tag team with another to see if the page is still up ..a bit like mass email spam ..can be bought by the million bot moves ) ..the bots in question tend to be set to cycle in periods of less than 7 days ..14 day no food cycle usually will make them go bother someone else ..your 404 pages will suffer for a while ..depends on the SE' ( not within your control that one ..bandwidth is minimal )..
Hoping you caught it fast enough to stay intact ..and BTW belatedly welcome to WebmasterWorld
Does THIS:
RewriteCond %{HTTP_REFERER} ^http://(www\.)?.*[-.]texas[-.] [NC,OR
Block ANY domain with 'texas' as part of the domain name, regardless of where 'texas' is located, so as:
[mynameis.texas.yes.com...]
[mynameis.com...]
[texas.com...]
http you get the idea...
ALL get blocked?
If so, then I just found the solution to my problem.
RewriteCond %{HTTP_REFERER} (keyword1¦keyword2¦keyword3¦etcetc) [NC,OR]
YES!
###STOP URL SPAMMERS
# Options +FollowSymlinks
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://(www\.)?frangipanicottage.com.*$ [OR]
RewriteCond %{HTTP_REFERER} (xcites-0-cost¦in-¦-pharmacy¦the-¦viagra) [NC,OR]
RewriteCond %{HTTP_REFERER} (phentermine¦hydrocodone¦vicodine¦zolev) [NC,OR]
RewriteCond %{HTTP_REFERER} (adipex¦consillieri¦credit¦tramadol) [NC,OR]
RewriteCond %{HTTP_REFERER} (lolita¦milf¦myspace¦money-plans) [NC,OR]
RewriteCond %{HTTP_REFERER} (propecia¦shemales¦pussy¦latinas) [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?donnasrentacar.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?ejproperties.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?easternvalleyfire.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?tranny-surprise.org.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?cdc-brewing.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?8th-street-latina.org.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?hq2o.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?impalalinear.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?mega-cock-cravers.ws.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?nanditaexports.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?live-together.ws.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?spectralkyd.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?photoko.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?giantsquidcenter.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?gunkan.org.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?ari-panama.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?adult-web-hosting-x.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?akeforestfunds.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?artofarkansas.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?amateurs-index.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?allaviationgear.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?adultfotos.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?adultsexsurf.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?jaja-jak-globusy.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?onlineearningcenter.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?ahcigars.com.*$ [NC]
RewriteRule \.*$ [mydomain.com...] [R,L]
This is George (privacyman). Just wanted to let you know that when I sent my original message to dodoni offering my help it was not to leave anyone out in the cold as for any cures of his problems that I recognized and thought that I knew how to solve. His problem(s) appeared to be caused by harvesters and by what I believe is called "log spamming" which I think has been discussed on this forum.
My immediate solutions for him was to add to his .htaccess file (with his approval) a list IP's to deny (small groups and some larger ones) that I suspected would have been a source of his problems from my own experience and from my own htaccess file.
Additionally I added my list of bad user agents, harvesters, snoops, etc.
Once he was able to send me a copy of his log file, near end of month was about 2 Meg GZ file and the actual log was 23 Meg, with my independent log analysis program I was able to confirm the bad sources. I had missed a few (from the originals that I added to his htaccess) so I then added the additional IP's in CIDR range spec.
Almost immediately he noticed a significant change.
Later in the middle of the new month when his log file should be much smaller, I will re-analyze is log, add any final IP's, UA's and referers. Then it should be a relatively easy task to keep on top of future month's work.
Considering that I felt that I had both some time and the knowledge to help, I also figured that I could give him some tips, a small amount of "training by email" and some links to additional "educational" info such as the apache.org site on "regex" and on modrewrite etc. And over the next while I can again give him some help from time to time if he needs until he learns all the ropes.
Again, hope that no one felt left out. Most of the help sources on issues like this are answered in various postings here at webmasterworld.com (I just didn't know which one to point him to, and it would have been many links probably too). I do have to say that this is a most wonderful and helpful forum, webmasterworld is where I gained most of my knowledge from about use of htaccess and how to control bad issues.
My apologies if anyone thought I was leaving others out by helping dodoni directly by email. That was not my intent. If my "two cents" can help the public then I post it for all to see.
Regards,
George
Seeing your trouble too, mentioned just before my other post, where you seem to be hit by the log spammers (those referers may or may not be valid).
Your example of using (I won't put it all here)
(word1¦word2¦word3¦word4) as "anywhere in the referer line" you "do" also have to be careful of that structure that you don't include any that might inadvertently block any good/valid sites or visitors.
That example would match word1 OR word2 OR word3 OR word4 that would occur ANYWHERE in the referer, so "do" use that format with some care and discretion for the words you pick. You're probably already aware of that.
Might I suggest, if you have access to a "web stats" program (online) at your server for your site, or if you have a standalone log stats analysis program, you could look for the IP numbers that are hitting your site the hardest. If no log analysis program on or off line, the other recourse would be to download your log file from your server to your computer, Un-GZip it if necessary, then view the log file in a wordprocessor or search for some of the bad referers,
one might be (I'll make up here).... www.freefunandpills.biz then look for the next (probable) fake referer.
You should be able to match up bad IP's with fake referers.
Then use a search engine to find "whois cidr" and some similar words to find online utilities that can do an IP Whois and help give you CIDR ranges (whether the range is shown with the whois lookup or not. CIDR range spec can specify the "owned range" of individual IP numbers, and a CIDR tool can give you the "spec" for other ranges that include that IP, handy for when you want to block a company or provider's exact range for one IP, or to make the range larger or smaller.
Once you figure out which IP numbers that are giving you those fake referers or other problems, then try blocking by CIDR range. That would look like
deny from 123.4.56.0/20 or similar, alternatively you can block just by an
individual IP 111.22.33.444 or by a partial which might look like 111.22.33.
Whereas the fake referers are often non working sites, and the names seem to change to different variations, I have found it often to be more effective to block or deny with the IP range or CIDR spec'd IP.
Hope this if of help to you.
George
Thank you for your assistance, I do have statistical logs which are easier to view than downloading server-gen logs, but I believe my problem has grown so far out of hand to where this type of solution will not work.
No offense to you, it is possible there exists an error in my .htaccess file but more and more they either slip through the cracks or otherwise gain access to the logs, every time I find myself downloading a month's worth of entries just to clean them out one by one, a mildly frustrating ordeal on a good day.
You are correct with key-word based denials one has to be careful about the legitimate referrers, I do have a few pharmaceutical sites (1 or 2) and also some adult sites who are not part of the problem and I do not want to deny them access for sure.
At this point I might mention the IP-based denial also needs to be done carefully due to the fluid nature of the Web, IP addresses and their owners do change over time so I have found it is a good idea to block/ban Ip's or their ranges for a period 6 months at a time or thereabouts, then unlocking the block for a short while to see if the problem has gone away: Sometimes it has, thou a ban lasting for a 1-year frame is not unusual or unheard of.
In the meantime, I find the least headache gained on my end is by disabling the 'referer' section of my public statistics and feel fortunate I can do so without further detriment to what I find interesting content.
Personally I wish spammers would realize what a waste of time this is (even from their end) and cease and desist in their actions, thou this is likely wishful thinking. :)
Peace out.