Forum Moderators: coopster & phranque

Message Too Old, No Replies

protect content - bounce after 5 from given ip

we need to put the brakes on site siphons

         

iggy99

9:10 pm on Jul 24, 2002 (gmt 0)

10+ Year Member



hello everybody!

i am finally getting back into the game and am finding a need to protect access to my site to 4 or 5 attempts froma any given ip within a 5 or 10 minute timespan ---

IE:

site siphon latches on to my www and is automatically downloading my entire site for "off line viewing" (as it were)

i need to prevent this

i could do it with cgi or php ---

any suggestion would be really appreciated

many thanks!

Knowles

9:12 pm on Jul 24, 2002 (gmt 0)

10+ Year Member



If they are using a program to get it you can block the UA from it. If its the same IPs over and over just block the IPs I think the easiest way (least thats what they say) is mod_rewrite. I would tell you how but I still havent figured it out.

jdMorgan

9:32 pm on Jul 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



iggy99,

What server are you using, i.e. Apache or IIS, or... ? The methods you can use differ depending
on what your platform is.

Jim

iggy99

9:47 pm on Jul 24, 2002 (gmt 0)

10+ Year Member



WOW! what a quick response --

we are unix - a tweaked raq 4 with mod rewrite enabled ---

thoughts?

mine in a nut shell - track page loads then bounce

Knowles

9:50 pm on Jul 24, 2002 (gmt 0)

10+ Year Member



I would shy away from doing it show lon as 4 or 5 unless you have a very small site. If you start doing it for everyone if they view to much you lose trafic. Unless you time it say 4 or 5 requests in 4 or 5 secs. I still say see if its a specific UA or IP and go from there.

jatar_k

10:02 pm on Jul 24, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I'm with Knowles

4 or 5 attempts in 10 mins seems like you ban a fair amount of traffic. The requests per 4 or 5 secs makes a little more sense. If you are having a specific problem with an identifiable ip or ua why not just ban it outright?

iggy99

10:16 pm on Jul 24, 2002 (gmt 0)

10+ Year Member



yes a few seconds makes more sense...

ok - a bit more in the subject matter --

i run a trade association of florists

we have actually figured out how to get quite favorable se rankings

we are going to dedicate an www.*.us domain to creat a optimised entry page for each florist - from there surfers will ALWAYS be taken away - to another domain except for the searchable database of flower shop listings - some 8-9000 of them --

this access to the list of florists is what we hope se spiders will follow - but we need to block the web site stealers in the world

[edited by: NFFC at 10:18 pm (utc) on July 24, 2002]

[edited by: iggy99 at 10:23 pm (utc) on July 24, 2002]

jatar_k

10:23 pm on Jul 24, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



maybe you should ban via htaccess and take a look for a good list of bad bots. I know there are sites around with lists of ip's ua's etc for the site thieves. I just can't think of one off the top of my head.

jdMorgan

10:49 pm on Jul 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



iggy99 and Knowles,

I do not count accesses, or anything else that would require actively tracking
site accesses, but I do use mod-rewrite in .htaccess to block certain site
abusers, so here's an example:

The following takes *any* accesses from bad_domain.com, from IP address
192.168.0.1, accesses by User-agents larbin and Indy Library (and variants), or
accesses refered from iaea.org (a common ruse), and redirects them to "no
file" ("-") with a server response code of 403-Forbidden, and then stops URL
rewriting.

Note that all RewriteConditions are "OR'ed" - if any one condition is satisfied,
the RewriteRule is applied. All conditions EXCEPT the last one therefore need
to have the [OR] at the end.

Other flag and Regex translations:
"[NC]" makes the pattern-matching case-insensitive. "\" is used to "escape"
spaces, periods, and other special characters to mean "look for the following
literal character."

"." means "any character", "?" means "the preceding character occurring 0 or 1
time," and "*" means "the preceding character 1 or more times."

The "^" and "$" are text anchors - note that I didn't use both in all cases.
"^" means the pattern must match at the beginning of the string, and "$" means
"the pattern must match at the end of the string". Using both means "the
pattern must match this string exactly." Using neither means "match anywhere
in the string."

Example:
RewriteEngine On
RewriteCond {REMOTE_HOST} ^bad_domain\.com$ [OR]
RewriteCond {REMOTE_ADDR} ^192\.168\.0\.1$ [OR]
RewriteCond {HTTP_USER_AGENT} ^larbin [NC,OR]
RewriteCond {HTTP_USER_AGENT} ^Indy.?Library [NC,OR]
RewriteCond {HTTP_REFERER} iaea\.org$
RewriteRule .* - [F,L]

For more details, see the authoritative source at
[httpd.apache.org...]

Please review the above example very carefully before trying to use it on your
site - I can not and will not promise that it's 100% correct!

Review your logs and see which IPs and User-agents are really problems. You
can also look around right here on WebmasterWorld (using site search) to find
lists of known bad guys and alternative ways of blocking access. Be very
careful with mod_rewrite and the other methods, though - you can easily get
carried away or make a small typographical error and block legitimate users or
even block everyone.

The Search Engine Spider Identification forum here often contains threads about
new and unidentified User-agents - often the first sign of trouble from a new
site-scraper or server-pounder.

However, at some point you'll likely decide that putting up with some minor
abuse is better than trying to keep up with ALL of the bad guys. Go for the
well-known ones and the ones that really pound your server and let the little
ones go - otherwise you risk your sanity (I have experience with this). ;)

Hope this helps,
Jim

iggy99

11:14 pm on Jul 24, 2002 (gmt 0)

10+ Year Member



jim, a great many thanks for the info

i will take it into consideration

it is a lot to absorbe!

maybe it is a fantasy to think we could count attempts to download by ip and bounce based on a value (number of attempts) in any given period of time -

a great many thanks

jdMorgan

12:02 am on Jul 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



iggy99,

No, it's not a fantasy, but it is difficult to do, and to do efficiently. I'm not a power-scripter,
but replied to give you an example of using mod_rewrite in .htaccess only after the others here
brought it up, and Knowles expressed interest in the subject as well.

If you come up with a good answer to your original question, I'd be very interested myself!

Thanks,
Jim

Knowles

5:17 pm on Jul 25, 2002 (gmt 0)

10+ Year Member



iggy, I agree with Jim I am sure what you are wanting is totally possible. It just in my opinion seemed simpler to go with a mod_rewrite if you knew the actual ones doing it. If you are more worried about future attacks of it happening it might be best to do what you are thinking, with a much lower time frame, with a mod_rewrite blocking common UAs and IPs that are known to be bad.

iggy99

8:13 pm on Jul 25, 2002 (gmt 0)

10+ Year Member



we are indeed thinking preemptive ---

i will post any potential solution i come up with here...

many thanks

Edge

1:28 pm on Jul 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would block undesireable useragents via .htaccess [webmasterworld.com...]

Additionally, I use a trap script that automatically bans visitors who try to download my entire site [webmasterworld.com...] .

After I implemented the above ban script, my page views reduced 15% but unique visitors remained the same. I am amazed at the number of folks that want to download, collect emails etc. from my site.

Good luck!

[edit]edit to fix url[/edit]

[edited by: Air at 4:16 pm (utc) on July 29, 2002]