Forum Moderators: phranque

Message Too Old, No Replies

Can't block certain IP address in htaccess

Visitor also changing user agent

         

grandma genie

4:20 pm on Aug 29, 2011 (gmt 0)

10+ Year Member



Hello,
I have a certain visitor to my website who appears to be a scraper. I have attempted to block their IP in htaccess, and for some strange reason it won't work for this one IP (69-112-200-nnn). So I blocked using a portion of their user agent:

RewriteCond %{HTTP_USER_AGENT} Creative\ AutoUpdate [NC,OR]

That worked, but now they are using a different user agent:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)

My question is, why would this particular IP not block in htaccess? I have quite a few IPs that I block. I have also contacted my hosting company and asked them to block it. They have not replied yet, but I am still not clear on why my usual methods are not working for this one. Any ideas?

Grandma_genie

wilderness

4:39 pm on Aug 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



gg,
A syntax error anyplace in your htaccess could exist for months (or longer) and not function as you originally intended, however the error may not become apparent until you make a simple and/or correctly formatted syntax, with the most recent addition not functioning.

I've had this occur untold times.

If your htaccess is large in lines, going through the lines individually and inspecting every character for missing or incorrect syntax is quite time consuming (also hard on the eyes and the aspirin bottle), however it becomes necessary.

Don

wilderness

4:47 pm on Aug 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The most common errors are a missing "[OR]", an extra "[" or "], and even a missing "[" or "], or extra or missing opening and closing parentheses.

Even missing escapes "\", or extra escapes "\".

grandma genie

6:10 pm on Aug 29, 2011 (gmt 0)

10+ Year Member



I thought that, too, but my htaccess file is divided up into sections, the first one is the:

order allow,deny
deny from 2.89
deny from 69.112.200
deny from 93.105.149
allow from all

Then following sections are for block by user agent, or block by referer, plus some other types of blocks based on certain hack attempts. I would assume that if there are problems within the allow/deny section, that only problems within that section would affect those IPs. Is that correct? Or is is possible that some error within the other sections could actually cause one of the IPs in the allow/deny portion to not work? If all the other IPs are being blocked correctly, why would only one of them not work?

I'd hate to spend all that time combing the other sections when I really only have to devote my time to the one allow/deny section.

What say you?

wilderness

6:39 pm on Aug 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would suggest adding a trailing period ( I do realize that present Apache standards say that's not necessary), it's worth a shot.

deny from 69.112.200.

If this fails, than your only alternative is to start checking syntax.

deny from IP lines are pretty straight forward and with the exception of the aforementioned trailing period nothing else appears wrong.

FWIW, denial of an IP range will not prevent the "access request" and/or subsequent appearance of the 403 in your logs.

2nd FWIW, there are times when some pests (due to brief server overloads) may gain an over-riding access to a single page, however not a group of pages.

lucy24

10:46 pm on Aug 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would assume that if there are problems within the allow/deny section, that only problems within that section would affect those IPs.

Don't bet on it. Only yesterday (this is really true) my server suddenly took it into its head that my "I don't like your face" page-- used only in a single RewriteRule-- was meant to be the generic "forbidden" page, resulting in a cascade of double 403s. I got it to stop, but would be much happier if I knew exactly why it was happening. Possibly something involving flags in a RewriteCond.

:: uneasily wondering how many people got hit with a wholly underserved "I don't like your face" while I was sorting this out ::

I didn't know you were allowed to Deny From anything less than a full four-part IP. I thought you always had to do the nnn.nnn.nnn.0/24 business :(

grandma genie

4:06 am on Aug 30, 2011 (gmt 0)

10+ Year Member



Well, my htaccess codes, so far, have been doing as expected, except for that one IP. I am trying a different tact. I have put another htaccess file in the directory where this particular scraper is spending all his time (down one level from the main directory). I'll see tomorrow if it worked. Meanwhile I took his IP off the other htaccess file, so it is only in the one file in that one directory. Like this:

order allow,deny
deny from 69.112.200.nnn
allow from all

He is all by his little onesies in there. Do you think that will do the trick? Or will that make things worse?

wilderness

5:21 am on Aug 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I learned very early on that if you deny to a Class D, and precise range, the visitor will always return to haunt you.

The only way I'd use a Class D, today would be with multiple conditions.

I didn't know you were allowed to Deny From anything less than a full four-part IP


There's many old threads on this, as long as a decade ago in the SSID forum.

Class A's, B's C's or D's

g1smd

7:25 am on Aug 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Unqualified, only the very last set of allow/deny rules have any effect.

That is, a later rule might unwittingly "allow" something that you thought was being "denied" by an earlier rule.

wilderness

1:14 pm on Aug 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For clarification, g1smd is referring to multiple "containers of allow/deny".

wilderness

1:42 pm on Aug 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The only way I'd use a Class D, today would be with multiple conditions.


My reference here is to mod_rewrite, NOT mod_access,

wilderness

3:19 pm on Aug 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Meanwhile I took his IP off the other htaccess file


gg,
It's still likely that you've a syntax error somewhere in the root htaccess, which is the primary reason for this failure, and IMO, locating that error (so as not to prevent the failure of current or future lines) is just as important as the failed denial of this IP.

grandma genie

5:35 pm on Aug 30, 2011 (gmt 0)

10+ Year Member



It worked! Mr. IP 69.112.200.nnn is now blocked for that directory. I went over the allow,deny section of my htaccess and can't find anything that would cause just that one IP not being blocked from accessing my site. The user has a static IP. I'll see if I can find any other possible cause for this issue. My host says they have blocked that IP from the server. It is too bad we can't contact these folks just to find out what is going on. He can access my home page, so if he wants to call and complain, he can. Thank you, Don, for your help.

grandma genie

7:51 pm on Aug 31, 2011 (gmt 0)

10+ Year Member



Just wanted to add that in going over the htaccess file I did find a couple of lines with some extra spaces. Don't know if that could cause a problem, but I have removed them. They were in the user agent block list. My visitor has not come back.