Welcome to WebmasterWorld Guest from 107.20.28.48

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

RegEx help with rewrite

     
10:15 am on Sep 30, 2011 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6071
votes: 75


I wish to block these 3 UAs that start/end as follows:

Moz
Mozilla
Mozilla/5.0


This works, but I want to shorten it to one line, make it more succinct:

RewriteCond %{HTTP_USER_AGENT} ^Moz(illa)?$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0$ [NC,OR]


Embarrassing because I used to know these things :)
10:16 am on Sept 30, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


The | operator performs a local OR function. Use parentheses around the whole.

Your rules contain ^start and end$ anchors so these are coded as an "exact match"; the user agent must be exactly and only the characters in the pattern.
7:09 pm on Sept 30, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:12990
votes: 287


Do you mean that those three specimens are the complete UA? Ugh. (That was directed at the stupid robots, not at your htaccess!)

One way is with nested parentheses:

^Moz(illa(/5\.0)?)?$

Here you're not capturing, just using parentheses to keep a group together. If htaccess recognizes \S (I never remember this stuff), you could even cover all bases with

^Moz\S*$

so they don't bob up next week and throw a bare "Mozilla/4.0" at you.

If you use pipes, note that they are checked before anything else, so

^abc|def$

does not mean the same as

^(abc|def)$
8:44 pm on Sept 30, 2011 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6071
votes: 75


Thanks for the suggestions, unfortunately nested elements beyond one layer don't work well on my server, I've tried many times over the years.

So this doesn't work:
^Moz(illa(/5\.0)?)?$

Nor this:
^Moz\S*$

This doesn't work either:
^Moz(illa|illa/5\.0)?$


This site sits on a huge shared hosting server farm and I have no idea how it's set-up. Anyway, as I said I do have it working with the two rewrite lines, I was just looking to do it with one. Thanks.
8:55 pm on Sept 30, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 27, 2001
posts:1472
votes: 0


Would the following work?

^(Moz|Mozilla|Mozilla/5\.0)$
9:23 pm on Sept 30, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:12990
votes: 287


Oh, wait. If you can't use \S, can you achieve the exact same effect the other way around? That is:

^Moz[^\ ]*$

Be sure to escape the space-- even inside brackets-- or your server will throw a conniption fit. (Not just yours. All servers.)
9:28 pm on Sept 30, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


This would work,

^(Moz|Mozilla|Mozilla/5\.0)$


but why force the parser to search for "Moz" in the input string three times.

Once you found it the first time, look for more stuff on the end, or not:

^(Moz(illa(/5\.0)?)?)$


There's no reason why this should fail, unless there's more characters on the end of the original input string.


This would also match
^Moz[^\ ]*$
but would also match all sorts of other stuff.
9:55 pm on Sept 30, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 27, 2001
posts:1472
votes: 0


g1smd, older versions of Apache (pre 2.0) do not support PCRE. There are also some types of non-Apache web servers out there that do support .htaccess files but in a very limited way. It's possible that keyplyr is stuck with one of these servers and is limited in what he can do. So even though the solutions you proposed are fine, they might not be able to work in his situation. My regex should work though.
10:04 pm on Sept 30, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


I'm aware that older Apache server versions (1.x and maybe some early 2.x) use only POSIX.

Which bit of
^(Moz(illa(/5\.0)?)?)$
is PCRE specific? I don't remember nesting being a problem before. Then again, I do forget a lot of stuff these days.
12:42 am on Oct 1, 2011 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6071
votes: 75


Lucy24, g1smd I tried:

^(Moz(illa(/5\.0)?)?)$ and voila it works - Thanks!

I must have had a typo first test.


This is (one of) the downside of having a site on a cheap hosting server farm.
2:13 am on Oct 1, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:12990
votes: 287


Maybe it was g1's seemingly gratuitous parentheses surrounding the whole thing?!
2:14 am on Oct 1, 2011 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6071
votes: 75



Maybe it was g1's seemingly gratuitous parentheses surrounding the whole thing?!

No doubt :)
6:16 am on Oct 1, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


That was unintentional, but if it works use it. :)
6:34 am on Oct 1, 2011 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6071
votes: 75


My "must have had a typo in earlier test" and "no doubt" was intended to imply I'm back to using Lucy's original code:

^Moz(illa(/5\.0)?)?$

Also, another downfall of my unnamed host (Ahem... Gdaddy) is that routers send requests in/out of file server clusters causing one group of machines to momentarily use a different copy of my .htaccess than another group of machines. Occasionally it messes with my testing if making changes quickly.
1:31 am on Oct 2, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:12990
votes: 287


Postscript: I just met a UA calling itself

Mozilla/5.0 ()

Triple nesting, anyone? ;)

Seriously, the numerical bit might as well say [1-9][0-9]*\.[0-9]+. It's only a matter of time...
1:46 am on Oct 2, 2011 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6071
votes: 75


Ya know, that makes perfect sense. Thanks, I'm using it!

But more along the lines of:

^Moz(illa(/[1-9]*\.[0-9]+)?)?$


That should cover a few years.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members