Forum Moderators: open

Message Too Old, No Replies

.htaccess file for blocking Scraper/Spam Bots

Htaccess File to Help with blocking Scraper/Spam Bots

         

sandman22

2:21 am on Sep 21, 2011 (gmt 0)

10+ Year Member



Hi, I don't have a blog or forum, but rather a plain jane .html static site, which is really having problems with malicious bots (spam, scrapers, other no gooders) hitting my site and stealing content. Does anyone have a tested and proven .htaccess file that they wouldn't mind sharing? I need something that already has a large list of bad ips and user agents if possible. I am a US based site, so all my sales are in the USA and Canada, so the other countries I don't care much about.


Also, what about blocking all "proxy" servers? Is that possible? Seems like alot of these offenders use them and virtually none of my actual sales are through them..


Any help on a proven .htaccess anyone? If you feel more comfortable sending it to me privately, just let me know.

Pfui

4:50 am on Sep 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The best htaccess will be the one you roll yourself.

This forum's Library documents are great places to learn how to tailor-make your access controls to your needs and server capabilities, including:

Stopping scrapers from the get-go [webmasterworld.com...]

dstiles

9:11 pm on Sep 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Proxies are a bit difficult. Blocking "real IP" proxies is often fair enough, though some translators and similar run through proxies. "Local IP" proxies (127.nnn, 192.68.nnn, 10.nnn etc) are often used by savvy visitors to help block unwanted input such as trojans from bad web sites.

I sometimes see bad behaviour on local proxies (eg scraping) but a lot of them are genuine users practising self-defence.

It is difficult blocking "real IP" proxies as well. Some are genuine and again used in self-defence (depending on the product being bought, for example) whilst others are noxious servers (which should not access web sites anyway) running through botnet proxies (which are compromised to begin with).

Sorry, can't help with .htaccess as I don't use that system.

sandman22

9:54 pm on Sep 21, 2011 (gmt 0)

10+ Year Member



Dstiles, what system do you use? Why does someone need to use a proxy for defense?

dstiles

7:59 pm on Sep 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I run a home-grown system - been working on it for around 10 years now but major update a couple of years ago to use MySQL as the IP repository.

Some people are very wary of being online, and the more paranoid ones use a third-party proxy server to hide behind when purchasing certain things (eg plain-wrapper goods) or posting to forums etc where they fear retribution if traced ("freedom fighters" and similar but also vulnerable people, amongst many others). Some proxies of this nature are genuine but quite a few are used mostly or only for dodgy purposes. All, as far as I've found, are run from "server farms" or ISPs' servers - some anti-spam companies operate them for their customers' protection.

Using local proxies usually means running your web/ftp/etc tools through a proxy based on your own desktop/portable machine. These can be set up to act as firewalls, to a certain extent, or may even be firewalls showing themselves to the outside world as proxies. Usually the "real" IP is traceable to a dynamic or static broadband vendor. If it's traceable to a server vendor then hit it with a big stick (this, of course, applies to all server IPs).