Forum Moderators: phranque

Message Too Old, No Replies

RegEx help

         

keyplyr

2:36 pm on Aug 29, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can someone please check to see if these 3 rewrite snippets correctly cover these ranges?

37.48.64.0 - 37.48.120.255

37\.48\.(6[4-9]|[789][0-9]|1[01][0-9]|120)\.

___

46.165.192.0 - 46.165.255.255

46\.165\.(19[2-9]|2[0-5][0-5])\.

___

50.116.0.0 - 50.116.63.255

50\.116\.([0-9]|[1-5][0-9]|6[0-3])\.

I've got a mistake that's cascading down my list of rewrites affecting the others and I'm trying to nail it down. Thanks

wilderness

3:58 pm on Aug 29, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



keyplr,
these all look ok.

The easiest way to locate a syntax error that is cascading is to break your htaccess down into multiple sections. Despite how large your file may be, as many as it takes to draw attention to you syntax error (believe I offered this one of previous communications).

Why not just send me the entire file?

tangor

4:26 pm on Aug 29, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Some of this looks like punching holes for some ranges. Try going back to basics (ie, the root form) for all denies then apply, once again, the hole punching.

Benefit of doing it that way is your .htaccess is still working, perhaps more than you'd like, but you are not losing while you work this out.

And a clean slate, as they say, makes the rest more visible.

lucy24

6:03 pm on Aug 29, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<tangent>
64-120? Really? Even 64-119 (that is, 64-95, 96-111 and 112-119) would surprise me less.
</tangent>

When a rule involves "up to 255" I generally just say 2\d\d since numbers higher than 255 simply don't occur, so no need to exclude them.

You didn't specify in your post, but does each pattern start with ^ opening anchor? If not, you risk blocking visitors with a leading 1 or 2 if they match the rest of the pattern. It's also more efficient, since the RegEx engine only has to check in one place.

Edit:
Oh, oops, this is wrong:
46\.165\.(19[2-9]|2[0-5][0-5])\.
It wouldn't match, for example, 249. If you don't want to go the 2\d\d route the parenthesised bit would have to be
(19[2-9]|2[0-4]\d|25[0-5])

In addition, this
50\.116\.([0-9]|[1-5][0-9]|6[0-3])\.
though not incorrect could be reduced to
50\.116\.([1-5]?[0-9]|6[0-3])\.

wilderness

8:24 pm on Aug 29, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Edit:
Oh, oops, this is wrong:
46\.165\.(19[2-9]|2[0-5][0-5])\.
It wouldn't match, for example, 249. If you don't want to go the 2\d\d route the parenthesised bit would have to be
(19[2-9]|2[0-4]\d|25[0-5])


lucy,
FWIW, he's having a difficult time enough adjusting to using the syntax! I don't see any benefit to making it even more difficult for him to comprehend?

keplyr,
My error and apologies?
The following:
46\.165\.(19[2-9]|2[0-5][0-5])\.
should read (note the change of the last five to a 9
46\.165\.(19[2-9]|2[0-5][0-9])\.

And any other line of syntax where you intend the 200 to 255 range should be defined with the same correction (or if your instantly able to comprehend and adapt to lucy's regex for strings than that method of correction)

wilderness

8:56 pm on Aug 29, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The easiest way to locate a syntax error that is cascading is to break your htaccess down into multiple sections.


keyplr,
many thanks for the 'karma' ;)
Just a short while ago and while reviewing my logs, came across a IP denial for which I was unable to locate any range (tested UA and it worked fine).
Thus I split two Class A's in mod-rewrite to define the error on the next such visitor to a smaller group.

keyplyr

12:32 am on Aug 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So in summation, except for changing the 5 to 9 regarding 46\.165\.(19[2-9]|2[0-5][0-9])\. they are correct? Thanks, I appreciate the help.

However, if 50\.116\.([0-9]|[1-5][0-9]|6[0-3])\. is correct,
I'm stumped why 50.116.49.209 was blocked
(don't worry about the other allow stuff)

wilderness

12:45 am on Aug 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



50\.116\.([0-9]|[1-5][0-9]|6[0-3])\. is correct!
and reads 50.116. 0 thru 9, 10 thru 59 and 60 thru 63.

I'm stumped why 50.116.49.209 was blocked

49 falls within the 10-59 range

keyplyr

1:24 am on Aug 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Let me rephrase... These are filters. I don't use rewrites to block, I use rewrites to allow (with certain attributes.)

What I did not post is the list of UAs that are allowed through. The UA at 50.116.49.209 is one that should have been allowed and he gets through from other ranges in this list. That's why I'm stumped & that's why I suspect that a rewrite above is incorrect and cascading for this rewrite to fail.


BTW Don, I sent you a .txt file (via email) for you to look over, thanks for the offer to do so however I don't think you know what a big task it is :)

wilderness

1:32 am on Aug 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Are you attempting to allow the entire Linode range of 50.116.0-63 or just the 50.116.49.?

The benefit to using Remote Address and mod-rewrite for IP ranges is that your given the ability to include/exclude very specif ranges (or small blocks).

Located three errors (over multiple lines) and sent them back to you.

wilderness

1:50 am on Aug 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



50\.116\.([0-9]|[1-5][0-9]|6[0-3])\. is correct!
and reads 50.116. 0 thru 9, 10 thru 59 and 60 thru 63.

I'm stumped why 50.116.49.209 was blocked

49 falls within the 10-59 range


50\.116\.([0-9]|[1235][0-9]|4[0-8]|6[0-3])\.
Omits including the 49 in this 0-63 range

keyplyr

1:53 am on Aug 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




50\.116\.([0-9]|[1235][0-9]|4[0-8]|6[0-3])\.
Omits including the 49 in this 0-63 range

I don't understand what you're saying. What does "omit" mean?


Are you attempting to allow the entire Linode range of 50.116.0-63

Yes

wilderness

2:12 am on Aug 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't understand what you're saying. What does "omit" mean?


Omit means 'poke a hole'.

If your using the 50\.116\.([0-9]|[1-5][0-9]|6[0-3])\.
as an allow than the syntax is correct and you have an error in another place.
If your using 50\.116\.([0-9]|[1-5][0-9]|6[0-3])\. as a deny (in this GROUP), than just remove the line.

keyplyr

2:54 am on Aug 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just to recap...

I block about 3k IP ranges. Of those, I have about 700 rewrite lines (poked holes) allowing access if the request meets certain conditions. These 700 lines are divided into 16 sections, each with specific conditions.

In one section, there appears to be an error in the RegEx of a rewrite line. Which line I am not certain. 50.116.49.209 should have cleared the conditions but did not. Since this agent clears the conditions when it comes from another allowed IP range, I assume it is the rewrite line specific to this range:
50\.116\.([0-9]|[1-5][0-9]|6[0-3])\.

This agent is important & and I need to let it through, so if the rewrite syntax is correct, I remain stumped.

Thanks for the help, specifically Don :)

wilderness

3:14 am on Aug 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



keyplr,
Possibly the others errors (corrections provided) in that same section are/were the cause of this range error.
You'll just need to follow-up after implementing the other corrections for verification.

lucy24

4:37 am on Aug 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



These 700 lines are divided into 16 sections

How long is this particular section?
:: business with calculator ::
40-plus lines? Does that mean 40-plus conditions to each RewriteRule? Does each section start with some uber-condition, like "general neighborhood of suchandsuch IP"? Can you quote the whole section? 700/16 is long, but it's not impossibly long.

In the actual code, does every set of numbers start with ^ (beginning of pattern)? Its absence is making me anxious, since I really doubt you want the identical rules to apply to 150.116.49.209. (Or, theoretically, to 250.etcetera, though this range happens not to exist.)

Gee, don't you wish your host would move up to 2.4 so you could shovel it all inside <If> envelopes instead?

keyplyr

4:54 am on Aug 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Gee, don't you wish your host would move up to 2.4 so you could shovel it all inside <If> envelopes instead?

Since you brought that up a couple weeks ago, I've thought of nothing else :)

As for the other questions, not wise to post all that publicly, but basically I have several sections where a bunch of ranges get allowed for mobile UAs if they meet sets of different conditions (hence the sections). The other ranges are grouped for other types of UAs & conditions. Remarkably, there isn't too much overlap so I don't need to list very many things twice.

lucy24

7:31 am on Aug 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't need to list very many things twice

You might look into using environmental variables. That's assuming mod_setenvif executes before mod_rewrite; on my system it does. (I experimented once to make certain.) Then you can do things like:
SetEnvIf Remote_Addr ^5\.(16[4-7]|228) bad_russia
SetEnvIf Remote_Addr ^64\.2(6|47\.1[2-5]\d) qiniq
* so the Regular Expression executes just once-and-for-all on each request. Then you make rules involving
RewriteCond %{ENV:bad_russia} .
(single dot simply meaning "this environmental variable exists") with further conditions.


* I don't know if this, specifically, is a Qiniq range. It might be Northwestel. It's just my name for the variable. And, of course, some of our bad Russians are really from Ukraine.

keyplyr

9:27 am on Aug 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks, I am using mod_setenvif to set conditions for several of the rewrite sections as well as other stuff. I'm resonably sure our servers are set up the same.

keyplyr

10:57 am on Aug 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Allowable time to edit post has past.

Some of my rules are site-centric allowing beneficial agents from otherwise blocked ranges, and other rules defensive by blocking certain agents out of allowed ranges. Allowing some agents in one section of rewrites but not another is part of that. Most have some other conditions, or filters.

I have also found that in the case of mobile apps especially, there is a high rate of jumping ship when it comes to hosts. Example: one of the iPhone apps that send me good traffic has rented from Amazon, then LeaseWeb and now back to Amazon but on a different cluster. I guess they are still experiencing growing pains but it's a PITA keeping current with it.

This is one reason I am using separate lines for each range rather than combining ranges into one condensed rewrite line. I may want to change it and it's less confusing to just remove a line than to re-figure the syntax.

keyplyr

1:25 am on Aug 31, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




50.116.49.209 is still being blocked.

50\.116\.([0-9]|[1-5][0-9]|6[0-3])\. (if correct) should allow access.

I remain stumped.

lucy24

1:57 am on Aug 31, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you can do it without causing the site to melt down:
Comment-out the rule that is causing 50.116 etcetera to get locked out. That is: the original, broader rule that is supposed to be overridden by the hole-poking exception.
Confirm that 50.116.etcetera can now get in.

This is a for-safety's-sake test to verify that the lockout is happening where you think it is, and not some entirely different unrelated place.

keyplyr

3:15 am on Aug 31, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not sure which rule is causing the agent from 50.116.49.209 to be blocked. There is no "broader rule causing..." This is it. But if I comment out...

RewriteCond %{REMOTE_ADDR} ^50\.116\.([0-9]|[1-5][0-9]|6[0-3])\. [OR]

...then I unblock the range, which is Linode. That beneficial agent comes about twice a week. Bad agents from that range come several times a day, every day.

Anyway, I made a couple changes to the environment attributes and I'll know in a few days whether it works. Sometime the code can be correct but because of other conditions, it fails. Odd too because the other 12 agents included in this rule all work.

It's difficult for anyone to help with this because the rules are not simple and I can't post them publicly. What does help is people giving suggestions. Even though they aren't exact, sometimes they help *enlighten* my thinking to realize something I had not previously considered.

lucy24

4:28 am on Aug 31, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But if I comment out...

RewriteCond %{REMOTE_ADDR} ^50\.116\.([0-9]|[1-5][0-9]|6[0-3])\. [OR]

...then I unblock the range, which is Linode. That beneficial agent comes about twice a week. Bad agents from that range come several times a day, every day.

Exactly. It's the only way to make sure that, in fact, this is this rule that's blocking your desired visitor. As an alternative to commenting-out the whole 40- or 50- or 200-line section, you could throw in a couple of extra "RewriteEngine off" and "on" statements. (This, in fact, is heavily plugged by The Docs as an advantage of the system.) Suppose you comment-out the rule and it turns out your visitor is still being blocked? That tells us we've all been looking in the wrong place. And if the rule is working as intended for everyone else, that's definitely a possibility to consider.

Then again, there might just be a typo in the line defining the User-Agent, or specifying the header, or any of the other factors involved.

For a while I had a rule blocking anyone with a particular header. Then it turned out that one human user-agent-- and a couple of tolerable robots*-- sends that same header, so I unblocked it. But the rule traps so ### many robots, I ended up shifting it from a once-and-for-all SetEnvIf rule to a conditional RewriteRule. We Shall See if I have better luck avoiding the false positives.


* Including, on rare occasions, the Googlebot itself. I find this mystifying.

keyplyr

5:26 am on Aug 31, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"there might just be a typo in the line defining the User-Agent..."

That's where I made some changes today after I posted. Basically, that respective UA had its bot name using a couple upper case letters but part of that bot name was also included with all lower case (possibly a recent change) in the URL also included in the UA string. So I changed the rule to be NC for that UA only.

There was nothing wrong with the way I had it, but the addition of the second mention may be confusing the server.

I hate it when servers get confused.

lucy24

7:20 am on Aug 31, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



the addition of the second mention may be confusing the server.

Is this particular RewriteCond (I assume that's what it is) negative or positive? A positive match is straightforward: either you find it or you don't. But negatives can be tricky because then you have to figure out whether you mean "none of these" or "not this one here" or "not the complete package" or etcetera... and then you have to make it plain to the server, which when all's said and sifted is just a dumb machine.

Robots do seem to like naming themselves twice: "BlahBlahBot at http://www.example.com/blahblahbot". I've still got a universal block on "GoogleBot", case-sensitive, though I think the UA was only in use (by unwelcome robots) for a short time.

keyplyr

9:19 am on Sep 1, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So now that I'm sitting at an actual computer instead of attempting to type from my phablet, I'll lay out what was going on with this agent.

At first I didn't catch the fact that the UA had changed. I look at a lot of stuff and after a while I sometimes start to loose focus on detail, but I figured my problem was a mistake in the syntax of the respective rewrite & since that is my weak point.

Posting here confirmed that the code was correct (wilderness corrected some other ranges for me - thanks) so I was stumped. I didn't have any old logs to ponder over so I was waiting until this agent visited again. But when it did, I was on the road and using a phablet so I couldn't process the access logs, I could only tell by that stupid excuse for a stats program (furnished by our dream of a host) that the agent was again blocked.

This is the agent:
Mozilla/5.0 (compatible; SomeoneImportantBot/1.0)


I was allowing: "SomeoneIm" case specific & abbreviated to save a few bytes. This worked for the several ranges I had in my list. All was well :)

Then at some point (when I wasn't looking) UA changed to:
Mozilla/5.0 (compatible; SomeoneImportantBot/1.0; support@someoneimportant.com http://someoneimportant.com/)


Now IMO my code *still* should have worked... but here is where I think the server got confused. There were 2 mentions of this "allowed UA", one containing a couple upper-case letters, the other all lower case. I'm was assuming this was the problem but I had to wait until I returned home to edit my code.

So I had 2 choices:
1.) change the UA syntax from "SomeoneIm" to "SomeoneImportantBot" or
2.) add the [NC] which is what I chose to do.

Did the edit, the agent came by & cleared the filter, all is well again :)