i am finally getting back into the game and am finding a need to protect access to my site to 4 or 5 attempts froma any given ip within a 5 or 10 minute timespan ---
IE:
site siphon latches on to my www and is automatically downloading my entire site for "off line viewing" (as it were)
i need to prevent this
i could do it with cgi or php ---
any suggestion would be really appreciated
many thanks!
ok - a bit more in the subject matter --
i run a trade association of florists
we have actually figured out how to get quite favorable se rankings
we are going to dedicate an www.*.us domain to creat a optimised entry page for each florist - from there surfers will ALWAYS be taken away - to another domain except for the searchable database of flower shop listings - some 8-9000 of them --
this access to the list of florists is what we hope se spiders will follow - but we need to block the web site stealers in the world
[edited by: NFFC at 10:18 pm (utc) on July 24, 2002]
[edited by: iggy99 at 10:23 pm (utc) on July 24, 2002]
I do not count accesses, or anything else that would require actively tracking
site accesses, but I do use mod-rewrite in .htaccess to block certain site
abusers, so here's an example:
The following takes *any* accesses from bad_domain.com, from IP address
192.168.0.1, accesses by User-agents larbin and Indy Library (and variants), or
accesses refered from iaea.org (a common ruse), and redirects them to "no
file" ("-") with a server response code of 403-Forbidden, and then stops URL
rewriting.
Note that all RewriteConditions are "OR'ed" - if any one condition is satisfied,
the RewriteRule is applied. All conditions EXCEPT the last one therefore need
to have the [OR] at the end.
Other flag and Regex translations:
"[NC]" makes the pattern-matching case-insensitive. "\" is used to "escape"
spaces, periods, and other special characters to mean "look for the following
literal character."
"." means "any character", "?" means "the preceding character occurring 0 or 1
time," and "*" means "the preceding character 1 or more times."
The "^" and "$" are text anchors - note that I didn't use both in all cases.
"^" means the pattern must match at the beginning of the string, and "$" means
"the pattern must match at the end of the string". Using both means "the
pattern must match this string exactly." Using neither means "match anywhere
in the string."
Example:
RewriteEngine On
RewriteCond {REMOTE_HOST} ^bad_domain\.com$ [OR]
RewriteCond {REMOTE_ADDR} ^192\.168\.0\.1$ [OR]
RewriteCond {HTTP_USER_AGENT} ^larbin [NC,OR]
RewriteCond {HTTP_USER_AGENT} ^Indy.?Library [NC,OR]
RewriteCond {HTTP_REFERER} iaea\.org$
RewriteRule .* - [F,L]
For more details, see the authoritative source at
[httpd.apache.org...]
Please review the above example very carefully before trying to use it on your
site - I can not and will not promise that it's 100% correct!
Review your logs and see which IPs and User-agents are really problems. You
can also look around right here on WebmasterWorld (using site search) to find
lists of known bad guys and alternative ways of blocking access. Be very
careful with mod_rewrite and the other methods, though - you can easily get
carried away or make a small typographical error and block legitimate users or
even block everyone.
The Search Engine Spider Identification forum here often contains threads about
new and unidentified User-agents - often the first sign of trouble from a new
site-scraper or server-pounder.
However, at some point you'll likely decide that putting up with some minor
abuse is better than trying to keep up with ALL of the bad guys. Go for the
well-known ones and the ones that really pound your server and let the little
ones go - otherwise you risk your sanity (I have experience with this). ;)
Hope this helps,
Jim
No, it's not a fantasy, but it is difficult to do, and to do efficiently. I'm not a power-scripter,
but replied to give you an example of using mod_rewrite in .htaccess only after the others here
brought it up, and Knowles expressed interest in the subject as well.
If you come up with a good answer to your original question, I'd be very interested myself!
Thanks,
Jim
Additionally, I use a trap script that automatically bans visitors who try to download my entire site [webmasterworld.com...] .
After I implemented the above ban script, my page views reduced 15% but unique visitors remained the same. I am amazed at the number of folks that want to download, collect emails etc. from my site.
Good luck!
[edit]edit to fix url[/edit]
[edited by: Air at 4:16 pm (utc) on July 29, 2002]