homepage Welcome to WebmasterWorld Guest from 54.243.12.156
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
regex mod access
wilderness




msg:4416482
 1:01 am on Feb 11, 2012 (gmt 0)

Might anybody be aware of using regex for IP's deny from?

Do I need to escape periods?

Escaped:
87\.([0-9]|[1-8][0-9]|9[0-689]|1[0-9][0-9]|2[0-5][0-9])\.

non-escaped
87.([0-9]|[1-8][0-9]|9[0-689]|1[0-9][0-9]|2[0-5][0-9]).

I've tried the non-escaped and did not get a 500 error, however I'm not getting all the 500's I should.

 

g1smd




msg:4416675
 7:45 pm on Feb 11, 2012 (gmt 0)

Yes, you should escape all of the literal periods.

lucy24




msg:4416708
 10:40 pm on Feb 11, 2012 (gmt 0)

A non-escaped period generally won't give you an error, it will just capture things that you might not have intended to capture.

Simple example:

1\.3
captures anything containing the literal string "1.3". But

1.3
captures "1.3" or "1,3" or "123" or "1a3" or... well, you get the idea.

In your example, the "87." works as intended because, of course, there are no IP addresses in the form "87\d". (I assume you've got an anchor to exclude 187.) But the unescaped . at the end, combined with the first [0-9] option, means you'll be capturing 87.anything at all.

Your very last group has [0-9] where all you need is [0-5] but it won't do any harm.

Come to think of it, I wouldn't do it this way at all. If you're setting up mod_rewrite Conditions I'd have one line for ^87\. and a second line for !^87\.97 since that's the only thing you're excluding. (All of Europe except Hungary?)

wilderness




msg:4416729
 12:58 am on Feb 12, 2012 (gmt 0)

using regex for IP's deny from


mod_access.

If you're setting up mod_rewrite Conditions


lucy, these denials in mod_access are a temporary solution to allow disceting my mod_rewrite section, which (as previously provided is not functioning.

FWIW, these thread seems as good-a-fit as a new topic.

4-5 years ago (perhaps longer) a gal began participating in either the SSID forum or the Apache forum and provided that Apache documentation stated many expressions (IP's in the particular instance) should be surrounded by quotes.

At the time, Jim agreed that the documentation explained the quotes the way, however disagreed that it was a one fits all application.

I had been using quotes sparringly for a long while as alternate "exactly as", which did not require escaping:
EX:
"Red Barn 1.0" (may still function)
rather than "Red\ Barn\ 1\.0" (not sure what the forum will do the intended blank spaces.

Last night I moved some UA's overt to SetEnvIf and they would not function with using the quotes:
SetEnvIfNoCase User-Agent "^(about|Accoona|Ace|ADS|Advert|Aport|Art)" keep_out

I poked around and found a 2010 reference by Jim explaining (didn't save the URL) the quotes. Unfortunately between working on massive web page updates, messing with logs, messing with htaccess (all pretty much simultaneously) for more than a week, my mind is frazzled.

In the past 2-3 years Jim became quite adamant about surround such expressions in quotes.

My point for all this (and my inquiry( is when did the significant change take place for regex from "POSIX" to "PCRE"?
And how does a moron using elcheapo hosting determine which is applicable (support is no help in this issues)?

adrian20




msg:4416738
 2:52 am on Feb 12, 2012 (gmt 0)

wilderness, a comment from my experience in this *SetEnvIfNoCase User-Agent* example that you add. Currently you are using a caret *^* at the beginning of the rule.

At the same rule, remove the caret, remove the two brackets *()*. Just leave the quotes at the beginning and end, and the division between each word to be captured (the pipe *|*).

What happens is that the caret indicates the start of the User Agent, and it is possible, you want to capture something that comes after the start of the User Agent; center or end.

After reading a lot about Regular Expressions, I realized that the parentheses *()* should be used for complicated rules. In this case, you want a simple catch, which even work without using quotation marks, but the "pipe" is necessary.

I use a simple shared web hosting account that costs me less than $10.00 for my little website, and it works perfectly.

But I insist, you should use "BrowserMatchNoCase" instead of SetEnvIfNoCase User-Agent.

wilderness




msg:4416755
 6:43 am on Feb 12, 2012 (gmt 0)

Many thanks adrian.

I've two sections of SetEnvIf for User-Agent.
One with the caret (User-Agent; begins with) and one without the caret (or a trailing dollar sign; ends with) for "contains.

I'll test omitting the parentheses (brackets).

Will even text the BrowserMatchNoCase, however it's likely that the latest versions of Apache use "PCRE" (rather than the earlier Apache versions using "POSIX") for both SetEnvIf and BrowserMatch.

Unfortunately I've a simple deny from in place that will not even function.

deny from 9[0-5].

As basic as you may find.

Or at least it wouldn't earlier Sat when somebody from 91 tested the waters.

keyplyr




msg:4416759
 9:27 am on Feb 12, 2012 (gmt 0)


As Lucy said, sometimes bad RegEx will not cause a 500 error but may cause other surprising results, like prohibiting further lines from working.

On several occasions over the last 15 years, I've had to comment-out line by line of my htaccess to find the *soft* error that stops other correctly written lines from working. A real PITA but needs to be done occasionally.

wilderness




msg:4416761
 9:36 am on Feb 12, 2012 (gmt 0)

many thanks keyplr.

I'm just attempting to get my massive file down to a necessary bare bones, so that I may begin adding back the disected sections piece-by-piece.

FWIW, going through 2300 lines and commenting-out line (s) and then testing (waiting for effect), before proceeding to the next comment-out line (s) would take an eternity (perhaps weeks or months).

Three hours ago, I reduced the file by 75% and am waiting to see function results. Course most everybody (at least those with any brains) in the Western World is sleeping ;)

lucy24




msg:4416764
 10:04 am on Feb 12, 2012 (gmt 0)

Unfortunately I've a simple deny from in place that will not even function.

deny from 9[0-5]

Can you do that?! I thought core-level IP blocks had to be in strict binary form, like

Deny from 90.0.0.0/7
Deny from 92.0.0.0/6

Somewhere or other I picked up the idea that setenvif uses quotes and mod_rewrite doesn't, except in some special non-regex constructions that nobody ever uses.

adrian20




msg:4416773
 11:30 am on Feb 12, 2012 (gmt 0)

wilderness, I think you should try this - According to my conclusion *deny from*, it can work only in the format (CIDR);

(using the example lucy24 added)
Deny from 90.0.0.0/7
Deny from 92.0.0.0/6

"But" working with the format; 9[0-5].

Must use SetEnv or RewriteCond.

Apparently, Apache added the use of CIDR IP to the new format SetEnv. And I emphasize, apparently. But as I say, you should run tests on it.

lucy24, That's why I like (sometimes), use this format;

SetEnvIf Remote_Addr ^9[0-5]\.

In my tests, Apache responds much faster.

###
#Sorry, a correction here
###

After adding my comment, I checked once again the Apache manual. The page where I have drawn this conclusion are;

[httpd.apache.org...]
[httpd.apache.org...]

In authz_host and Access control by environment variable. Are the only place I have seen the CIDR format. Some time ago I performed tests and concluded that (Access control by environment variable) can work with CIDR format.

Then, in the part that I said;

Apparently, Apache added the use of CIDR IP to the new format SetEnv.

It should say;

Apparently, Apache added the use of CIDR IP to Access control by environment variable.

wilderness




msg:4416778
 12:11 pm on Feb 12, 2012 (gmt 0)

I'm not trying to be obstinate and I appreciate the suggestions, however if regex in deny from wotks in this capacity:
87\.([0-9]|[1-8][0-9]|9[0-689]|1[0-9][0-9]|2[0-5][0-9])\.

why on earth wouldn't it work in a simpler 9[0-5] ?

adrian, if deny from only worked in the CIDR capacitty than a simple 123.456.789. would fail, and that works (or at least it has for more than a decade.

FWIW, I'm accustomed (in memory) to the extensive regex functions for IP's and abhor CIDR.

I've obviously a major syntax error, however despite days and fuzzy eyes, I've yet to locate it.

adrian20




msg:4416797
 2:33 pm on Feb 12, 2012 (gmt 0)

Yes, that was my initial question. At the time I finish responding to me in this way. When IP is complete (123.456.789), then "Deny from" works, but when the CIDR function (123.456.0.0/16), I must use only "Deny from", it is better to shorten the file too.

If you have access to cPanel, do a test. When you use "IP Deny Manager" in cPanel, he adds "Deny from" but in CIDR format, and when you add one single IP, it does too.

The other test is to use CIDR format with "Deny from" and a full IP (as you did). Later confronting it with the ^9[0-5]\. format.

My other conclusion is that the CIDR format is newer, Apache somehow has failed to operate this format in the server for all of the module, or is it also possible they have decided to abandon the format beginning with RewCom.

g1smd




msg:4416813
 5:10 pm on Feb 12, 2012 (gmt 0)

Somewhere or other I picked up the idea that setenvif uses quotes and mod_rewrite doesn't, except in some special non-regex constructions that nobody ever uses.

You can use quotes in mod_rewrite:
RewriteCond %{REMOTE_HOST} = "11.22.33.44"
is equivalent to
RewriteCond %{REMOTE_HOST} ^11\.22\.33\.44$
I don't remember why you use one over the other. I just use the latter.

lucy24




msg:4416853
 11:21 pm on Feb 12, 2012 (gmt 0)

OK, stop the presses. I just detoured to my art studio's site where I can experiment with htaccess to my heart's content.

The good news:

Deny from 1\.2\.3\.4
Deny from 1\.2\.3
Deny from 1\.2\.[3-9]

all work. That is, they don't throw 500 errors. I just made up the numbers, so I'll assume they would actually deny the appropriate visitor.

The bad news:

The moment I add even a single RegEx element, such as an escaped \. for a plain . (dot) the logs switch over to the ghastly

adsl-67-117-146-102.dsl.snfc21.pacbell.net

format.

Your mileage may vary. But that's reason enough for me not to use Regular Expressions in "Deny from..." statements.

wilderness




msg:4417525
 5:49 pm on Feb 14, 2012 (gmt 0)

Eleven days and nights of working feverishly on htaccess and html has taken its toll and I'm under the weather. My eyes having been giving problems for more than a year (new glasses didn't help, nor even changing my monitor display back to 800 x 600.

In the process of breaking down my file (as well as the insights provided here), I've made some interesting realizations in the past few days.

I'm anxious to explain the results properly and cannot do so until my head clears.

My apologies.

Don

wilderness




msg:4417573
 7:37 pm on Feb 14, 2012 (gmt 0)

I'm simply overwhelmed with recent activity and the effect has been detrimental.
If there are mistakes in my explanation, please explore on your own and further.
In addition, I've read somewhere at Webmaster World and searched the WWW for clarifications that I've failed to mark those references.


First and an Earth to Don!
Due to my htaccess inactivity, I'd forgotten one of the most basic requirements.
NOTEPAD and other editors are why my mod_rewrite was not functioning.

As previously mentioned, upon my re-entry into activity as a webmaster, the extensive mod_rewrite portion of my htaccess file (last used in early 2010) was not functioning.
I began dissecting large portions to determine cause.
Simultaneously, I was required to add new portions (error docs and AddType), which I had not been required to use previously.

To implement error docs, I was able to locate two old threads in which jdMorgan (Jim). explained the procedure.
Here's one [webmasterworld.com]

SetEnvIf Request_URI "^(robots\.txt|custom_403_page\.html)$" allowall

Note Jim's example (from April 2010) of enclosing the opening and closing anchors within parentheses.

I used this for three days and my custom error documents failed (per one persistent visitor requesting the same page at least a dozen times with each 24-hour period).

Just as soon as I randomly moved the anchors outside the parentheses, my custom error functioned.

Although Jim's references were useful, I had a non-working issue in one that required a random change/test after not be able to locate a solution. Perhaps Jim just made a typo, or perhaps recent updates in Apache have changed the then functioning procedure.

For clarification and regarding the escaping of characters which causes the log format change? I have nine files with the same time that were modified in close proximity. The first time around was difficult enough and I'm not about to relive that nightmare (my eyes and thoughts wee so out of focus that I never even noticed the format change until twelve hours later) just for example, rather I'll provide the end result.

I have some nagging recollection of Jim explaining vaguely (and on multiple ocassions) that there was a difference in the application of procedures of "POSIX" between mod_rewrite and Deny From (mod_access), or SetEnvIf (mod_setenvif), and/or both of the latter.

I can assure that there is an assured difference in the application "PCRE" between mod_rewrite and Deny From (mod_access); SetEnvIf (mod_setenvif).

Here are two such examples.
(If the logic fails?
Take two aspirins and cal me in the morning.)
of SetEnvIf (mod_setenvif)

Both of these function fine in mod_setenvif
(change your UA and clear your browser cache to text.)

SetEnvIf User-Agent compatible\;\)$ keep_out
SetEnvIf User-Agent ^MSIE 6\.0$ keep_out

within mod_setenvif, and after previous confirmation remove the escapes Use the same testing UA and cache clear.
They will fail.

Next!
Take the 2nd line and convert it to mod_rewrite.
INTENTIONALLY leaving the blank space after MSIE.
No need to test because you should get a 500,
(bad delimiter (space) in error logs) however if you don't?
(change your UA and clear your browser cache to text.).

I've removed all escapes from both Deny From (mod_access),
or SetEnvIf (mod_setenvif) lines. Relocating those lines to
mod_rewrite.

I'm also working on removing all CIDR from my file.
I've another nagging recollection of Jim's, of the perils of mixing CIDR with other syntax.

NOR will I use regex for IP's in Deny From (mod_access).

NOR will I use (or even test) the Files container again.
(See Jim's April 2010 explanation of the log format change in above link.)

Additionally and with Deny From (mod_access) or SetEnvIf (mod_setenvif), I'll ONLY use parentheses sparingly, and I'll ONLY use them as an "exactly as", wherein escape characters are not required. (i. e. POSIX, which I know functions as it should)


I've looked and looked for special character designation in regards to Deny From (mod_access) and SetEnvIf (mod_setenvif), and all I could locate were references to PCRE, absent any designation of special characters
in either module. (At least mod_rewrite provides eleven).

I'm simply overwhelmed with the necessary detail of all this and hope that it proves useful to others.

It surely does NOT provide any solid Apache, regex or PCRE documentation for these explanations.

I'm sure that I've other examples and solutions in the past 48-hours, however my brain is fried (too many changes, too much syntax all with out making notes, although I did number the numerous files), however my htaccess is functioning and I may resume adding back in my 900 lines of IP's ;)

g1smd




msg:4417603
 8:52 pm on Feb 14, 2012 (gmt 0)

When making a lot of changes you really should have some sort of versioning system in place. For htaccess I'll usually have archive files called site-name-01-htaccess, site-name-02-htaccess etc and use Tortoise to compare the versions for changes.

For a big project I'll set up a Subversion (or Git) repository to track all changes and to be able to backtrack when errors are found. I've just finished a complex file with 700+ rules within and rule order was crucial. There's multiple archive copies documenting changes, and that has helped to, so far, fix 5 bugs within.

wilderness




msg:4417612
 9:16 pm on Feb 14, 2012 (gmt 0)

Many thanks g1smd.

I've always keep monthly backups of my htaccess, and for this solutions kept version copies (numbered) the past three days.

Unfortunately, I'd go in too make one change and make more, all in attempt to get the file functioning properly. As well as locate the overall failure of mod_rewrite.

Ex:
SetEnvIf User-Agent ^MSIE 6\.0$ keep_out

I converted six lines at once from mod_setenvif to mod_rewrite, and when it failed, had to go back and check each line individually. As luck (or bad luck) would have it, the last line checked was the failure.
To keep versions on something so trivial would be useless.

You'd think the blank space would have jumped OUT at me, however after a while of looking at syntax the lines become blurry and quite similar.

It didn't help that I had ro sometimes wait ours for a genuine visitor, or at least the one I was testing for.

I'm sure glad it's over ;)

wilderness




msg:4417620
 9:32 pm on Feb 14, 2012 (gmt 0)

I looked at TortoiseSVN.
Thanks.

I'm quite picky about upsetting the stability of my computer.
I'm old DOS guy (still use some DOS occasionally) and use Windows Explorer extensively, some times have 3-4 open.
Wouldn't like nothing upsetting the functionality of Explorer.

The most important tools on my computer (beside OS) are the scanner and scanner software. Everything I do revolves around that.
Over time, I'd lost some of my scanner driver features and unable to determine why. This past year a solution was found by accident for something that existed for eight years.

Today, I'm quite leery about upsetting that boar again.
Wouldn't even add tax software this year because it required SP3, which I refuse to add.

lucy24




msg:4417636
 10:10 pm on Feb 14, 2012 (gmt 0)

Deny From (mod_access)

Do you actually HAVE mod_access, or are you just using the term out of habit?

Now I'm worried.

wilderness




msg:4417637
 10:19 pm on Feb 14, 2012 (gmt 0)

mod_access [httpd.apache.org]

wilderness




msg:4417644
 10:36 pm on Feb 14, 2012 (gmt 0)

lucy,
It's irrelevant to me whether it's called mod_access in Apache Ver 1.2 or mod_auth_host in Apache 2.2.

They both work the DENY FROM. FWIW, I don't have either installed. I'm on shared hosting ;)


If you wish to restrict access to portions of your site based on the host address of your visitors, this is most easily done using mod_authz_host [httpd.apache.org].

The Allow and Deny directives let you allow and deny access based on the host name, or host address, of the machine requesting a document. The Order directive goes hand-in-hand with these two, and tells Apache in which order to apply the filters.

The usage of these directives is:

lucy24




msg:4417646
 10:40 pm on Feb 14, 2012 (gmt 0)

Whew. I'm on shared hosting and they won't even tell us what Apache version they use. ("Security reasons", whatever that means.) But as a consolation prize they show you how to test for which named modules are installed. So the lack of mod_access is my indirect way of knowing that they're reasonably up-to-date :)

wilderness




msg:4417650
 10:45 pm on Feb 14, 2012 (gmt 0)

I'm on shared hosting and they won't even tell us what Apache version they use. ("Security reasons"


I've heard this crap from my el cheapo host as well, however I can assure from having other hosts, that if you pay a few dollars more for hosting, the load mods will be right on their website.

My el cheapo also informed that ftp log were a security risk as well, which I had on multiple shared hosts previously.
And forget about rotating logs (their definition of rotation is a new log each day).

lucy24




msg:4417688
 1:29 am on Feb 15, 2012 (gmt 0)

Hm, I think we're on the same host. The change in log access was a horrible inconvenience in the short term, but then I found Fugu, which is perfectly happy to do SFTP with a GUI, the way God intended.* And then shortly afterward Fetch changed something or other, so now everything is back to normal. I just had to make a separate shortcut.

Double whew ;)

And, as I said, once you've checked which mods are installed, that gives you a pretty close idea of what Apache version you're on. No later than x, no earlier than y.


* Weird but true, I had to find a Linux www site to learn how to connect using-- ugh, ugh-- Terminal.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved