homepage Welcome to WebmasterWorld Guest from 50.16.112.199
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
htaccess: Block IP range, but allow certain user agents within.
that range
brokaddr



 
Msg#: 4529025 posted 5:39 pm on Dec 17, 2012 (gmt 0)

Is such possible?

For example, I want to block:
123.456.7/20

But a particular bot I want to allow is on this range:
123.456.789.1

They have a unique user agent, can I allow an "exception" to htaccess that if "Bot X" user agent visits, allow it in, but block anything else on that range.

 

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4529025 posted 5:54 pm on Dec 17, 2012 (gmt 0)

Yes. Use another preceding RewriteCond with the additional test.

Block access if the request is the big block and NOT the small block.

OR

Block access if the request is the big block and NOT the required user agent.

Use a ! for "not".

brokaddr



 
Msg#: 4529025 posted 6:00 pm on Dec 17, 2012 (gmt 0)

Could you kindly provide an example?

Block access if the request is the big block and NOT the small block.

I'm not sure how to determine which is which?

I'm guessing it's somehting like this?
deny from 123.456.7/20 !"user-agent":Bot X
wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4529025 posted 6:57 pm on Dec 17, 2012 (gmt 0)

It's more difficult in mod_alias.

The method g1smd provoded (i. e., Use another preceding RewriteCond with the additional test) is with mod_rewrite.

Additionally, your NOT able to use CIDR for IP's in mod_rewrite.

brokaddr



 
Msg#: 4529025 posted 7:14 pm on Dec 17, 2012 (gmt 0)

Bummer, so the only way around this is to allow the block of unwanted IPs?

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4529025 posted 7:17 pm on Dec 17, 2012 (gmt 0)

NO.

You do it in mod_rewrite.

There are (hundreds, perhaps thousands of examples here at Webmaster World) of multiple conditions using mod_rewrite.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4529025 posted 7:23 pm on Dec 17, 2012 (gmt 0)

Here's an example from Feb 2012 [webmasterworld.com].

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4529025 posted 11:34 pm on Dec 17, 2012 (gmt 0)

NOT able to use CIDR for IP's in mod_rewrite

Wasn't that one of the luscious additions in 2.4? Can't find it now, so I may have imagined it :(

mod_rewrite is the easiest-- in part because that's what most people around here are used to.

It could also be done in an ordinary Allow/Deny statement if you are in "Deny,Allow" order. Then you would Deny the bigger range and Allow the smaller range. But I don't think anyone hereabouts is brave enough to whitelist based solely on IP ranges.

It could be done in mod_setenvif if you do two pieces: first set the bigger range to keep_out = 1 (or whatever variable you use) and then reset the smaller range to keep_out = 0.

It's more difficult in mod_alias.

Your fingers typed "mod_alias" (simple Redirects) but I think your forebrain meant "mod_access" and/or "mod_authz_thingummy" depending on Apache version ;)

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4529025 posted 1:44 am on Dec 18, 2012 (gmt 0)

RewriteCond %{REMOTE_HOST} ^200\.
RewriteCond %{REMOTE_HOST} !^200\.50\.
RewriteCond %{REQUEST_URI} !^/error\.php
RewriteRule .* - [F]


Block access if remote address begins 200. but does not begin 200.50.

The third condition allows the error page to be served - omit this line and you'll have Internal Server Error 500 instead.

The IP addresses can be specified using standard RegEx notation, e.g. 200\.50\.(3[2-9]|4[0-7])\. matches 20.50.32.nn to 47.nn and the trailing escaped period is very important.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4529025 posted 2:30 am on Dec 18, 2012 (gmt 0)

the trailing escaped period is very important

... or at least it would be if the numbers involved weren't 200 and 50 respectively ;) Safer to form a habit and stick with it, though. (I once killed a stylesheet by neglecting to include the closing / in a 410 specification.)

brokaddr



 
Msg#: 4529025 posted 3:49 pm on Dec 21, 2012 (gmt 0)

wow thanks for all the posts guys, this is a lot of complicated info (for me) to try and comprehend!
The IP addresses can be specified using standard RegEx notation, e.g. 200\.50\.(3[2-9]|4[0-7])\. matches 20.50.32.nn to 47.nn and the trailing escaped period is very important.

I know the CIDR range of the block I want to boot, but the IP I want to allow falls in that line.

Please tell me if my interpretation is valid:
RewriteCond %{REMOTE_HOST} ^123\.456\.78\. <--CIDR block (how would I put something like this into the block: 184.72.0.0/15 (amazon IP, for example)
RewriteCond %{REMOTE_HOST} !^123\.333\.333\4\. <-- Full IP of what I want to allow
RewriteCond %{REQUEST_URI} !^/error\.php <-- Error page served to the block that isn't the IP mentioned above
RewriteRule .* - [F]

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4529025 posted 4:19 pm on Dec 21, 2012 (gmt 0)

The "error page" must be the one that is served for 403 error status, as that is what the [F] signifies.

Look on the web for a CIDR to RegEx converter.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4529025 posted 4:50 pm on Dec 21, 2012 (gmt 0)

The "error page" must be the one that is served for 403 error status, as that is what the [F] signifies.


g1smd,
Sorry to hijack this.

I've 20 similar custom solutions within mod_rewrite, and absent any such designation.
Any idea why I'm NOT getting 500's?

Don

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4529025 posted 8:01 pm on Dec 21, 2012 (gmt 0)

Does your error page have a custom name and location or are you using the host's suggested name? (I stress: filename, not actual content.) I ran into that same 500-error problem when I switched from the default "/forbidden.html" to a group of files within a /boilerplate/ directory. It seems safe to assume that the top-level config file has a <Files> envelope covering those specific names, along with a generic ErrorDocument directive. If you use your own filename you have to add exemptions.

And if your boilerplate uses any includes, make sure those include files are also exempted. 8,000 guesses how I know this.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4529025 posted 10:14 pm on Dec 21, 2012 (gmt 0)

lucy,
Yes I do use a custom 403, however I've not a 500 error all month (at least with the exception of those implemented by myself when working with syntax).

With some of the 20 custom exceptions being implemented daily.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4529025 posted 10:14 pm on Dec 21, 2012 (gmt 0)

if your boilerplate uses any includes, make sure those include files are also exempted

That only applies if you are including the files "over the web" with protocol and hostname in the reference. If you're "including" as internal folder and file objects, not utilising http, then what's in the htaccess file has no bearing.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4529025 posted 3:43 am on Dec 22, 2012 (gmt 0)

Yes I do use a custom 403

In a custom location, not the "default custom" that any halfway decent host offers? My defaults were "missing.html" and "forbidden.html", and maybe one or two others. As long as I just used those names, I didn't need an ErrorDocument directive. I had to add one when things got too messy and I shoved everything into the /boilerplate/ folder instead. And then I also had to put in assorted <FilesMatch> and similar exceptions because I was getting nonstop 500-class errors. Not a huge heartbreak when the end result is that the robot you're trying to lock out still can't get in-- but ten times as much work for the server, because it tries ten times to serve that blocked 403 file before giving up.

That only applies if you are including the files "over the web" with protocol and hostname in the reference.

Want to see my error log? ;) The most entertaining was when I returned 410 for everything in a particular directory, except three named files that were not Gone. Until I added the html footer to the list, I got errors saying "could not include..."

Most of the time we only think of two types of request: the ones the user put in explicitly, and the follow-up that the browser might put in if there's a redirect. Hence all those {THE_REQUEST} conditions. But there's also the Internal Subrequest. I've never used the [NS] flag but it might be a good alternative in some rules.

For example, a page which is included using an SSI (Server Side Include) is a subrequest, and you may want to avoid rewrites happening on those subrequests. Also, when mod_dir tries to find out information about possible directory default files (such as index.html files), this is an internal subrequest, and you often want to avoid rewrites on such subrequests. On subrequests, it is not always useful, and can even cause errors, if the complete set of rules are applied. Use this flag to exclude problematic rules.

I wish I'd seen that earlier; it wouldn't have taken so much trial and error figuring out why my auto-indexes had suddenly stopped working.

I detoured here to re-enact the error I mentioned above. htaccess normally says

RewriteCond %{REQUEST_URI} !(translator|questions|index|silentfooter)\.html
RewriteRule ^silence/\w+\.html - [G]


I deleted the |silentfooter element (this is a vanilla html include) and took a quick visit to one of the real pages. Refresh browser window, visible footer disappears and error logs dutifully report:

[Fri Dec 21 19:29:24 2012] [error] [client {my-IP-here}] unable to include "/silence/silentfooter.html" in parsed file /home/{username}/example.com/silence/translator.html

Access log doesn't mention the incident; it only shows the back-to-back external requests for stylesheet, js and so on.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4529025 posted 6:26 am on Dec 22, 2012 (gmt 0)

Yes I do use a custom 403


In a custom location, not the "default custom" that any halfway decent host offers?


lucy,
We went around on this some months ago. I would have hoped that you'd recall it because we've similar elcheapo hosts.
In any event, here goes:

My elcheapo host implements a default error doc (with their own advertising) as opposed to a browser-plain-jane. That default is located directories above my root.
To get around the advertising error doc, a customer (myself) must configure custom error pages.

Whether these custom error pages (located in my individual site (s) root (not multiple domains root located within my host server account root) are considered a [custom location) I'm without clue.

In any event this inquiry, which I thought would be simple, has become a nightmare.

Please accept my apologies for hijacking this thread and disregard inquiry.

Don

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4529025 posted 8:06 am on Dec 22, 2012 (gmt 0)

I would have hoped that you'd recall it because we've similar elcheapo hosts.

You forgot that you can't put me and any synonym of "remember" into the same sentence ;) But now that you tell the whole thing again, it does come back.

Do your own personal custom pages happen to have the same name (just the last part, not the path) as the cheapie-advertising version? If so, they might still be protected by something in the config file.

For comparison purposes: my shared htaccess (used by my personal site, the art studio site and the Third Site) has a bit that goes:

<FilesMatch "(forbidden|goaway|missing|cell)\.html">
Order Allow,Deny
Allow from all
</FilesMatch>

and assorted similar envelopes including the ubiquitous

<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>

All of them trickle downward to the individual sites because "Files" and "FilesMatch" work from right to left.

:: memo to self: get rid of the |cell element since I no longer use anything with this name ::

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved