Welcome to WebmasterWorld Guest from 54.226.246.160

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

RegEx help with rewrite

     

keyplyr

10:15 am on Sep 30, 2011 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I wish to block these 3 UAs that start/end as follows:

Moz
Mozilla
Mozilla/5.0


This works, but I want to shorten it to one line, make it more succinct:

RewriteCond %{HTTP_USER_AGENT} ^Moz(illa)?$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0$ [NC,OR]


Embarrassing because I used to know these things :)

g1smd

10:16 am on Sep 30, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The | operator performs a local OR function. Use parentheses around the whole.

Your rules contain ^start and end$ anchors so these are coded as an "exact match"; the user agent must be exactly and only the characters in the pattern.

lucy24

7:09 pm on Sep 30, 2011 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Do you mean that those three specimens are the complete UA? Ugh. (That was directed at the stupid robots, not at your htaccess!)

One way is with nested parentheses:

^Moz(illa(/5\.0)?)?$

Here you're not capturing, just using parentheses to keep a group together. If htaccess recognizes \S (I never remember this stuff), you could even cover all bases with

^Moz\S*$

so they don't bob up next week and throw a bare "Mozilla/4.0" at you.

If you use pipes, note that they are checked before anything else, so

^abc|def$

does not mean the same as

^(abc|def)$

keyplyr

8:44 pm on Sep 30, 2011 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Thanks for the suggestions, unfortunately nested elements beyond one layer don't work well on my server, I've tried many times over the years.

So this doesn't work:
^Moz(illa(/5\.0)?)?$

Nor this:
^Moz\S*$

This doesn't work either:
^Moz(illa|illa/5\.0)?$


This site sits on a huge shared hosting server farm and I have no idea how it's set-up. Anyway, as I said I do have it working with the two rewrite lines, I was just looking to do it with one. Thanks.

Key_Master

8:55 pm on Sep 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Would the following work?

^(Moz|Mozilla|Mozilla/5\.0)$

lucy24

9:23 pm on Sep 30, 2011 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Oh, wait. If you can't use \S, can you achieve the exact same effect the other way around? That is:

^Moz[^\ ]*$

Be sure to escape the space-- even inside brackets-- or your server will throw a conniption fit. (Not just yours. All servers.)

g1smd

9:28 pm on Sep 30, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



This would work,

^(Moz|Mozilla|Mozilla/5\.0)$


but why force the parser to search for "Moz" in the input string three times.

Once you found it the first time, look for more stuff on the end, or not:

^(Moz(illa(/5\.0)?)?)$


There's no reason why this should fail, unless there's more characters on the end of the original input string.


This would also match
^Moz[^\ ]*$
but would also match all sorts of other stuff.

Key_Master

9:55 pm on Sep 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



g1smd, older versions of Apache (pre 2.0) do not support PCRE. There are also some types of non-Apache web servers out there that do support .htaccess files but in a very limited way. It's possible that keyplyr is stuck with one of these servers and is limited in what he can do. So even though the solutions you proposed are fine, they might not be able to work in his situation. My regex should work though.

g1smd

10:04 pm on Sep 30, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I'm aware that older Apache server versions (1.x and maybe some early 2.x) use only POSIX.

Which bit of
^(Moz(illa(/5\.0)?)?)$
is PCRE specific? I don't remember nesting being a problem before. Then again, I do forget a lot of stuff these days.

keyplyr

12:42 am on Oct 1, 2011 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Lucy24, g1smd I tried:

^(Moz(illa(/5\.0)?)?)$ and voila it works - Thanks!

I must have had a typo first test.


This is (one of) the downside of having a site on a cheap hosting server farm.

lucy24

2:13 am on Oct 1, 2011 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Maybe it was g1's seemingly gratuitous parentheses surrounding the whole thing?!

keyplyr

2:14 am on Oct 1, 2011 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




Maybe it was g1's seemingly gratuitous parentheses surrounding the whole thing?!

No doubt :)

g1smd

6:16 am on Oct 1, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



That was unintentional, but if it works use it. :)

keyplyr

6:34 am on Oct 1, 2011 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



My "must have had a typo in earlier test" and "no doubt" was intended to imply I'm back to using Lucy's original code:

^Moz(illa(/5\.0)?)?$

Also, another downfall of my unnamed host (Ahem... Gdaddy) is that routers send requests in/out of file server clusters causing one group of machines to momentarily use a different copy of my .htaccess than another group of machines. Occasionally it messes with my testing if making changes quickly.

lucy24

1:31 am on Oct 2, 2011 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Postscript: I just met a UA calling itself

Mozilla/5.0 ()

Triple nesting, anyone? ;)

Seriously, the numerical bit might as well say [1-9][0-9]*\.[0-9]+. It's only a matter of time...

keyplyr

1:46 am on Oct 2, 2011 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Ya know, that makes perfect sense. Thanks, I'm using it!

But more along the lines of:

^Moz(illa(/[1-9]*\.[0-9]+)?)?$


That should cover a few years.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month