homepage Welcome to WebmasterWorld Guest from 54.166.173.147
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
RegEx help with rewrite
keyplyr




msg:4369098
 10:15 am on Sep 30, 2011 (gmt 0)

I wish to block these 3 UAs that start/end as follows:

Moz
Mozilla
Mozilla/5.0


This works, but I want to shorten it to one line, make it more succinct:

RewriteCond %{HTTP_USER_AGENT} ^Moz(illa)?$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0$ [NC,OR]


Embarrassing because I used to know these things :)

 

g1smd




msg:4369099
 10:16 am on Sep 30, 2011 (gmt 0)

The | operator performs a local OR function. Use parentheses around the whole.

Your rules contain ^start and end$ anchors so these are coded as an "exact match"; the user agent must be exactly and only the characters in the pattern.

lucy24




msg:4369255
 7:09 pm on Sep 30, 2011 (gmt 0)

Do you mean that those three specimens are the complete UA? Ugh. (That was directed at the stupid robots, not at your htaccess!)

One way is with nested parentheses:

^Moz(illa(/5\.0)?)?$

Here you're not capturing, just using parentheses to keep a group together. If htaccess recognizes \S (I never remember this stuff), you could even cover all bases with

^Moz\S*$

so they don't bob up next week and throw a bare "Mozilla/4.0" at you.

If you use pipes, note that they are checked before anything else, so

^abc|def$

does not mean the same as

^(abc|def)$

keyplyr




msg:4369284
 8:44 pm on Sep 30, 2011 (gmt 0)

Thanks for the suggestions, unfortunately nested elements beyond one layer don't work well on my server, I've tried many times over the years.

So this doesn't work:
^Moz(illa(/5\.0)?)?$

Nor this:
^Moz\S*$

This doesn't work either:
^Moz(illa|illa/5\.0)?$


This site sits on a huge shared hosting server farm and I have no idea how it's set-up. Anyway, as I said I do have it working with the two rewrite lines, I was just looking to do it with one. Thanks.

Key_Master




msg:4369288
 8:55 pm on Sep 30, 2011 (gmt 0)

Would the following work?

^(Moz|Mozilla|Mozilla/5\.0)$
lucy24




msg:4369297
 9:23 pm on Sep 30, 2011 (gmt 0)

Oh, wait. If you can't use \S, can you achieve the exact same effect the other way around? That is:

^Moz[^\ ]*$

Be sure to escape the space-- even inside brackets-- or your server will throw a conniption fit. (Not just yours. All servers.)

g1smd




msg:4369298
 9:28 pm on Sep 30, 2011 (gmt 0)

This would work,

^(Moz|Mozilla|Mozilla/5\.0)$

but why force the parser to search for "Moz" in the input string three times.

Once you found it the first time, look for more stuff on the end, or not:

^(Moz(illa(/5\.0)?)?)$

There's no reason why this should fail, unless there's more characters on the end of the original input string.


This would also match
^Moz[^\ ]*$ but would also match all sorts of other stuff.
Key_Master




msg:4369317
 9:55 pm on Sep 30, 2011 (gmt 0)

g1smd, older versions of Apache (pre 2.0) do not support PCRE. There are also some types of non-Apache web servers out there that do support .htaccess files but in a very limited way. It's possible that keyplyr is stuck with one of these servers and is limited in what he can do. So even though the solutions you proposed are fine, they might not be able to work in his situation. My regex should work though.

g1smd




msg:4369322
 10:04 pm on Sep 30, 2011 (gmt 0)

I'm aware that older Apache server versions (1.x and maybe some early 2.x) use only POSIX.

Which bit of
^(Moz(illa(/5\.0)?)?)$ is PCRE specific? I don't remember nesting being a problem before. Then again, I do forget a lot of stuff these days.
keyplyr




msg:4369369
 12:42 am on Oct 1, 2011 (gmt 0)

Lucy24, g1smd I tried:

^(Moz(illa(/5\.0)?)?)$ and voila it works - Thanks!

I must have had a typo first test.


This is (one of) the downside of having a site on a cheap hosting server farm.

lucy24




msg:4369388
 2:13 am on Oct 1, 2011 (gmt 0)

Maybe it was g1's seemingly gratuitous parentheses surrounding the whole thing?!

keyplyr




msg:4369389
 2:14 am on Oct 1, 2011 (gmt 0)


Maybe it was g1's seemingly gratuitous parentheses surrounding the whole thing?!

No doubt :)

g1smd




msg:4369432
 6:16 am on Oct 1, 2011 (gmt 0)

That was unintentional, but if it works use it. :)

keyplyr




msg:4369435
 6:34 am on Oct 1, 2011 (gmt 0)

My "must have had a typo in earlier test" and "no doubt" was intended to imply I'm back to using Lucy's original code:

^Moz(illa(/5\.0)?)?$

Also, another downfall of my unnamed host (Ahem... Gdaddy) is that routers send requests in/out of file server clusters causing one group of machines to momentarily use a different copy of my .htaccess than another group of machines. Occasionally it messes with my testing if making changes quickly.

lucy24




msg:4369643
 1:31 am on Oct 2, 2011 (gmt 0)

Postscript: I just met a UA calling itself

Mozilla/5.0 ()

Triple nesting, anyone? ;)

Seriously, the numerical bit might as well say [1-9][0-9]*\.[0-9]+. It's only a matter of time...

keyplyr




msg:4369647
 1:46 am on Oct 2, 2011 (gmt 0)

Ya know, that makes perfect sense. Thanks, I'm using it!

But more along the lines of:

^Moz(illa(/[1-9]*\.[0-9]+)?)?$


That should cover a few years.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved