| Strange UA getting banned
|
marodhum

msg:3604395 | 7:13 pm on Mar 18, 2008 (gmt 0) | Hi, from last couple of days i am seeing this user agent- "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-us) AppleWebKit/523.15.1 (KHTML, like Gecko) Version/3.0.4 Safari/523.15 in my log and it is getting 403-ed. Though there is nothing in my .htaccess regarding this UA, ip or referer. I do not think this is a bot or crawler, it had good referer and it was only asking for GET/ page.html HTTP1.1 and then favicon file. I am totally puzzled by this thing. Following are the ip, 76.173.1.nn 72.234.76.nnn 71.180.230.nnn 151.204.155.nnn 82.23.185.nnn 67.168.90.nn I do have these two lines in my .htaccess, which i don't remember why i have copied RewriteCond %{HTTP_USER_AGENT} ^[a-z0-9\ ]{15,}$ [NC,OR] RewriteCond %{HTTP_USER_AGENT} [b-df-hj-np-tvwxz]{5,} [NC,OR] could it be the reason behind it? As
|
Samizdata

msg:3604464 | 8:23 pm on Mar 18, 2008 (gmt 0) | | I do not think this is a bot or crawler |
| You are correct - it is a Safari browser running on a recent Apple computer with an Intel processor. | I do have these two lines in my .htaccess, which i don't remember why i have copied |
| It is never a good idea to copy/paste .htaccess rules you don't understand. The first line means "No punctuation for 15 characters" and the second means "No vowels for 5 characters", and to use them you would normally have them in a distinct block starting with an exclusion for Mozilla.
|
marodhum

msg:3604477 | 8:36 pm on Mar 18, 2008 (gmt 0) | Samizdata, thank you for your response. you mean to say, I should have this line in my rewrite condition RewriteCond %{HTTP_USER_Agent} !Mozilla [NC,OR] Am i right? And one more thing, do you think this is triggering the 403 for the safari browser? As
|
Samizdata

msg:3604877 | 4:22 am on Mar 19, 2008 (gmt 0) | Umm... no. At the moment you are blocking the majority of Mac users (including me) from your site. I suggest you remove anything that you don't understand from your .htaccess and spend some time searching, reading and learning in the Apache Web Server forum, where every question about .htaccess that I have ever thought of has already been answered (in many cases more than once) by some real experts. Guessing is never going to work, but studying will.
|
marodhum

msg:3605237 | 1:41 pm on Mar 19, 2008 (gmt 0) | To Samizdata, | Guessing is never going to work, but studying will. |
| I agree with you, but as a mariner, I do also know that sometimes you have to sail through uncharted water, may be to avoid the tornado.. which will be a sure destruction. Moreover learning by experiancing is the best method, i think. | I suggest you remove anything that you don't understand from your .htaccess |
| If you notice my first post.. there i have clearly stated | I do have these two lines in my .htaccess, which i don't remember why i have copied |
| There was a reason for copyong those lines in my .htaccess at that time.. which I have forgotten. Not that, I kept those two there unnecessarily. You know, not everybody is smart enough to do everything.. may be i am one of that type parson. This post is not to contradict you or anybody else.. it is to express my thoughts, so that nobody misunderstood me. I appreciate your effort to answer all these posts and thanks you profoundly from the bottom of my heart. Btwn.. i think the second line was the culprit.. "No vowels for 5 characters".. for the ban. Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-us) AppleWebKit/ 523.15.1 (KHTML, like Gecko) Version/3.0.4 Safari/523.15.... I am no JDMorgan.. but may be that highlighted portion was triggering the rule, though there is a mozilla exclusion in my .htaccess. Last but not the least, | At the moment you are blocking the majority of Mac users (including me) from your site. |
| Is there a way for senior memebers to know about my site? though I do not have the url in my profile. As P.S. - If i hurt your feelings by this post.. I apologize
|
jdMorgan

msg:3605903 | 12:31 am on Mar 20, 2008 (gmt 0) | I think Samizdata meant that you'd see this Safari browser user-agent string in your logs if Samizdata visited your site... and right now, it's blocked. :) This should be a stand-alone rule-set, like this:
# If UA string does not start with "Mozilla/" RewriteCond %{HTTP_USER_Agent} !^Mozilla/ # AND (there is no punctuation in any 15-character sequence in the UA string RewriteCond %{HTTP_USER_AGENT} [a-z0-9\ ]{15,} [NC,OR] # OR if there is no vowel within any 5-character sequence in the UA string) RewriteCond %{HTTP_USER_AGENT} [b-df-hj-np-tvwxz]{5,} [NC] # then return a 403-Forbidden response RewriteRule .* - [F]
Notice that the flags on the first and last line have been intentionally modified, and that I added comments -- The best way to prevent the "I forgot what this was for" problem. The parentheses in the comments indicate the AND/OR precedence. Note that if you use a custom 403 Error Document, you will need to make provisions to allow that document to be served even to blocked user-agents. I also suggest letting *any* user-agent fetch your robots.txt file. So, adding a rule like this --above the preceding rule-set-- would solve that problem:
RewriteRule ^(robots\.txt¦path-to-custom-403-document\.html)$ - [L]
Replace the broken pipe character "¦" above with a solid pipe before use; Posting on this forum modifies the pipe characters. Jim [edit] Fixed code tags [/edit] [edited by: jdMorgan at 5:12 am (utc) on Mar. 20, 2008]
|
g1smd

msg:3605915 | 12:39 am on Mar 20, 2008 (gmt 0) | This highlights something I started doing a long time ago. Put comments before EVERY line of code to explain what it is supposed to be doing. Often, you'll look back and find that the code you wrote doesn't do exactly what you thought it should. Those notes can be very useful in explaining why you did something "that way" several years ago.
|
marodhum

msg:3606079 | 5:02 am on Mar 20, 2008 (gmt 0) | Thanks everybody for your help and suggestions. I have change my .htaccess as suggested by you and now it is working nicely.. Blocking the unwanted ones and allowing others. Just one more point @ Jim, when I add the code preceeding my existing directives it blocks everybody. <Snip> Options +FollowSymLinks RewriteEngine on # If UA string does not start with "Mozilla/" RewriteCond %{HTTP_USER_Agent} !^Mozilla/ # AND (there is no punctuation in any 15-character sequence in the UA string RewriteCond %{HTTP_USER_AGENT} [a-z0-9\ ]{15,}$ [NC,OR] # OR if there is no vowel within any 5-character sequence in the UA string) RewriteCond %{HTTP_USER_AGENT} [b-df-hj-np-tvwxz]{5,} [NC] # then return a 403-Forbidden response RewriteRule !^(403\.html¦robots\.txt)$ - [F,L] RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\)$ [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^$ [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^-$ [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^-?$ [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4.0$ [NC,OR] RewriteCond %{HTTP_referer} ^http://www.iaea.org$ [NC,OR] RewriteRule .* - [F] </snip> But if I put your code after my original rule and change the first rule to RewriteRule !^(403\.html¦robots\.txt)$ - [F,L] from RewriteRule .* - [F] It works nicely. p.s. - I have removed the comments from my .htaccess to shorten this lengthy post.
|
wkitty42

msg:3608701 | 7:17 pm on Mar 23, 2008 (gmt 0) | i've got a question about the following line from the above... RewriteCond %{HTTP_USER_AGENT} [b-df-hj-np-tvwxz]{5,} [NC] why, if we're using b-d for the bcd sequence and f-h for the fgh sequence, would we not also use v-x for the vwx sequence? does it matter? does it make any appreciable difference? why not just use bcdfghjklmnpqrstvwxz?
|
g1smd

msg:3608724 | 8:16 pm on Mar 23, 2008 (gmt 0) | Additionally, I didn't realise that Y was now a vowel.
|
jdMorgan

msg:3608809 | 1:11 am on Mar 24, 2008 (gmt 0) | It is a vowel, functionally -- The exception being if it occurs as the initial character in an English word. In fact, it's the only vowel in one of our more interesting membernames [google.com]. The highly-technical answer to why the pattern was "[b-df-hj-np-tvwxz]{5,}" is "because I typed it that way after thinking about it for all of two seconds." There are many ways to construct a regex pattern, so it's all down to style in many cases. "[abc]" is slightly faster to process than "[a-c]", and it's also easier to read (for me). When five or more sequential characters are to be matched, the performance balance definitely tilts towards range notation. But you can code it any way you like. Jim
|
wkitty42

msg:3608866 | 2:58 am on Mar 24, 2008 (gmt 0) | thanks for that info, jim... i was just curious... BTW: good to read you, again... i've been gone from here for quite a while ;)
|
jdMorgan

msg:3609135 | 1:56 pm on Mar 24, 2008 (gmt 0) | Welcome back, wkitty42! Looking through my notes on this code, I've found that my original does not include an [OR] flag on the RewriteConds. That is, the original code requires all three RewriteConds to be true before issuing a denial:
# If UA string does not start with "Mozilla/" RewriteCond %{HTTP_USER_Agent} !^Mozilla/ # AND there is no punctuation in any 15-character sequence in the UA string RewriteCond %{HTTP_USER_AGENT} [a-z0-9\ ]{15,}$ [NC] # AND if there is no vowel within any 5-character sequence in the UA string RewriteCond %{HTTP_USER_AGENT} [bcdfghj-np-tvwxz]{5,} [NC] # then return a 403-Forbidden response RewriteRule !^(403\.html¦robots\.txt)$ - [F]
and once again, [L] used with [F] is redundant and unnecessary. Jim
|
wkitty42

msg:3609205 | 3:19 pm on Mar 24, 2008 (gmt 0) | thanks... | Looking through my notes on this code, I've found that my original does not include an [OR] flag on the RewriteConds. That is, the original code requires all three RewriteConds to be true before issuing a denial: |
| well, that makes sense, too... because you want both of the character evaluation conditions to be true before blocking them... instead of one or the other...
|
|
|