homepage Welcome to WebmasterWorld Guest from 54.163.72.86
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Strange UA getting banned
marodhum




msg:3604395
 7:13 pm on Mar 18, 2008 (gmt 0)

Hi,
from last couple of days i am seeing this user agent- "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-us) AppleWebKit/523.15.1 (KHTML, like Gecko) Version/3.0.4 Safari/523.15 in my log and it is getting 403-ed.
Though there is nothing in my .htaccess regarding this UA, ip or referer. I do not think this is a bot or crawler, it had good referer and it was only asking for GET/ page.html HTTP1.1 and then favicon file. I am totally puzzled by this thing.
Following are the ip,

76.173.1.nn
72.234.76.nnn
71.180.230.nnn
151.204.155.nnn
82.23.185.nnn
67.168.90.nn

I do have these two lines in my .htaccess, which i don't remember why i have copied

RewriteCond %{HTTP_USER_AGENT} ^[a-z0-9\ ]{15,}$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} [b-df-hj-np-tvwxz]{5,} [NC,OR]

could it be the reason behind it?
As

 

Samizdata




msg:3604464
 8:23 pm on Mar 18, 2008 (gmt 0)

I do not think this is a bot or crawler

You are correct - it is a Safari browser running on a recent Apple computer with an Intel processor.

I do have these two lines in my .htaccess, which i don't remember why i have copied

It is never a good idea to copy/paste .htaccess rules you don't understand.

The first line means "No punctuation for 15 characters" and the second means "No vowels for 5 characters", and to use them you would normally have them in a distinct block starting with an exclusion for Mozilla.

marodhum




msg:3604477
 8:36 pm on Mar 18, 2008 (gmt 0)

Samizdata, thank you for your response.
you mean to say, I should have this line in my rewrite condition
RewriteCond %{HTTP_USER_Agent} !Mozilla [NC,OR]
Am i right?
And one more thing, do you think this is triggering the 403 for the safari browser?
As

Samizdata




msg:3604877
 4:22 am on Mar 19, 2008 (gmt 0)

Am i right?

Umm... no.

At the moment you are blocking the majority of Mac users (including me) from your site.

I suggest you remove anything that you don't understand from your .htaccess and spend some time searching, reading and learning in the Apache Web Server forum, where every question about .htaccess that I have ever thought of has already been answered (in many cases more than once) by some real experts.

Guessing is never going to work, but studying will.

marodhum




msg:3605237
 1:41 pm on Mar 19, 2008 (gmt 0)

To Samizdata,

Guessing is never going to work, but studying will.

I agree with you, but as a mariner, I do also know that sometimes you have to sail through uncharted water, may be to avoid the tornado.. which will be a sure destruction. Moreover learning by experiancing is the best method, i think.

I suggest you remove anything that you don't understand from your .htaccess

If you notice my first post.. there i have clearly stated
I do have these two lines in my .htaccess, which i don't remember why i have copied

There was a reason for copyong those lines in my .htaccess at that time.. which I have forgotten. Not that, I kept those two there unnecessarily. You know, not everybody is smart enough to do everything.. may be i am one of that type parson.

This post is not to contradict you or anybody else.. it is to express my thoughts, so that nobody misunderstood me.
I appreciate your effort to answer all these posts and thanks you profoundly from the bottom of my heart.

Btwn.. i think the second line was the culprit.. "No vowels for 5 characters".. for the ban.

Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-us) AppleWebKit/ 523.15.1 (KHTML, like Gecko) Version/3.0.4 Safari/523.15.... I am no JDMorgan.. but may be that highlighted portion was triggering the rule, though there is a mozilla exclusion in my .htaccess.

Last but not the least,
At the moment you are blocking the majority of Mac users (including me) from your site.

Is there a way for senior memebers to know about my site? though I do not have the url in my profile.

As
P.S. - If i hurt your feelings by this post.. I apologize

jdMorgan




msg:3605903
 12:31 am on Mar 20, 2008 (gmt 0)

I think Samizdata meant that you'd see this Safari browser user-agent string in your logs if Samizdata visited your site... and right now, it's blocked. :)

This should be a stand-alone rule-set, like this:

# If UA string does not start with "Mozilla/"
RewriteCond %{HTTP_USER_Agent} !^Mozilla/
# AND (there is no punctuation in any 15-character sequence in the UA string
RewriteCond %{HTTP_USER_AGENT} [a-z0-9\ ]{15,} [NC,OR]
# OR if there is no vowel within any 5-character sequence in the UA string)
RewriteCond %{HTTP_USER_AGENT} [b-df-hj-np-tvwxz]{5,} [NC]
# then return a 403-Forbidden response
RewriteRule .* - [F]

Notice that the flags on the first and last line have been intentionally modified, and that I added comments -- The best way to prevent the "I forgot what this was for" problem. The parentheses in the comments indicate the AND/OR precedence.

Note that if you use a custom 403 Error Document, you will need to make provisions to allow that document to be served even to blocked user-agents. I also suggest letting *any* user-agent fetch your robots.txt file. So, adding a rule like this --above the preceding rule-set-- would solve that problem:

RewriteRule ^(robots\.txt¦path-to-custom-403-document\.html)$ - [L]

Replace the broken pipe character "¦" above with a solid pipe before use; Posting on this forum modifies the pipe characters.

Jim

[edit] Fixed code tags [/edit]

[edited by: jdMorgan at 5:12 am (utc) on Mar. 20, 2008]

g1smd




msg:3605915
 12:39 am on Mar 20, 2008 (gmt 0)

This highlights something I started doing a long time ago.

Put comments before EVERY line of code to explain what it is supposed to be doing.

Often, you'll look back and find that the code you wrote doesn't do exactly what you thought it should.

Those notes can be very useful in explaining why you did something "that way" several years ago.

marodhum




msg:3606079
 5:02 am on Mar 20, 2008 (gmt 0)

Thanks everybody for your help and suggestions.
I have change my .htaccess as suggested by you and now it is working nicely.. Blocking the unwanted ones and allowing others.

Just one more point @ Jim,
when I add the code preceeding my existing directives it blocks everybody.

<Snip>
Options +FollowSymLinks
RewriteEngine on
# If UA string does not start with "Mozilla/"
RewriteCond %{HTTP_USER_Agent} !^Mozilla/
# AND (there is no punctuation in any 15-character sequence in the UA string
RewriteCond %{HTTP_USER_AGENT} [a-z0-9\ ]{15,}$ [NC,OR]
# OR if there is no vowel within any 5-character sequence in the UA string)
RewriteCond %{HTTP_USER_AGENT} [b-df-hj-np-tvwxz]{5,} [NC]
# then return a 403-Forbidden response
RewriteRule !^(403\.html¦robots\.txt)$ - [F,L]

RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\)$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^-$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^-?$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4.0$ [NC,OR]
RewriteCond %{HTTP_referer} ^http://www.iaea.org$ [NC,OR]
RewriteRule .* - [F]
</snip>

But if I put your code after my original rule and change the first rule to RewriteRule !^(403\.html¦robots\.txt)$ - [F,L] from RewriteRule .* - [F]
It works nicely.

p.s. - I have removed the comments from my .htaccess to shorten this lengthy post.

wkitty42




msg:3608701
 7:17 pm on Mar 23, 2008 (gmt 0)

i've got a question about the following line from the above...

RewriteCond %{HTTP_USER_AGENT} [b-df-hj-np-tvwxz]{5,} [NC]

why, if we're using b-d for the bcd sequence and f-h for the fgh sequence, would we not also use v-x for the vwx sequence? does it matter? does it make any appreciable difference? why not just use bcdfghjklmnpqrstvwxz?

g1smd




msg:3608724
 8:16 pm on Mar 23, 2008 (gmt 0)

Additionally, I didn't realise that Y was now a vowel.

jdMorgan




msg:3608809
 1:11 am on Mar 24, 2008 (gmt 0)

It is a vowel, functionally -- The exception being if it occurs as the initial character in an English word.

In fact, it's the only vowel in one of our more interesting membernames [google.com].

The highly-technical answer to why the pattern was "[b-df-hj-np-tvwxz]{5,}" is "because I typed it that way after thinking about it for all of two seconds."

There are many ways to construct a regex pattern, so it's all down to style in many cases. "[abc]" is slightly faster to process than "[a-c]", and it's also easier to read (for me). When five or more sequential characters are to be matched, the performance balance definitely tilts towards range notation. But you can code it any way you like.

Jim

wkitty42




msg:3608866
 2:58 am on Mar 24, 2008 (gmt 0)

thanks for that info, jim... i was just curious...

BTW: good to read you, again... i've been gone from here for quite a while ;)

jdMorgan




msg:3609135
 1:56 pm on Mar 24, 2008 (gmt 0)

Welcome back, wkitty42!

Looking through my notes on this code, I've found that my original does not include an [OR] flag on the RewriteConds. That is, the original code requires all three RewriteConds to be true before issuing a denial:

# If UA string does not start with "Mozilla/"
RewriteCond %{HTTP_USER_Agent} !^Mozilla/
# AND there is no punctuation in any 15-character sequence in the UA string
RewriteCond %{HTTP_USER_AGENT} [a-z0-9\ ]{15,}$ [NC]
# AND if there is no vowel within any 5-character sequence in the UA string
RewriteCond %{HTTP_USER_AGENT} [bcdfghj-np-tvwxz]{5,} [NC]
# then return a 403-Forbidden response
RewriteRule !^(403\.html¦robots\.txt)$ - [F]

and once again, [L] used with [F] is redundant and unnecessary.

Jim

wkitty42




msg:3609205
 3:19 pm on Mar 24, 2008 (gmt 0)

Welcome back, wkitty42!

thanks...

Looking through my notes on this code, I've found that my original does not include an [OR] flag on the RewriteConds. That is, the original code requires all three RewriteConds to be true before issuing a denial:

well, that makes sense, too... because you want both of the character evaluation conditions to be true before blocking them... instead of one or the other...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved