homepage Welcome to WebmasterWorld Guest from 54.167.185.110
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Banned in htaccess, still showing up in webalizer
How and why?
Decius




msg:4109037
 6:47 pm on Apr 2, 2010 (gmt 0)

Hey guys, any help would be appreciated.

So I've banned certain ips from being able to access data on a domain by adding "deny from *****" clauses in an htaccess file in the root directory of the domain.

I have tested its functionality by adding my own IP, and immediately I cannot access any data in the root dir or any sub-dirs of the site, indicating that htaccess is working.

However, despite doing this, I keep seeing other banned ips (specifically ones from facebook's thumbnail crawler or something) showing up downloading massive amounts of data.

Is there a way that htaccess's ability to recognize the IP is different than webalizer, and therefore, the IP that webalizer is displaying is not actually what htaccess sees, which is why it is not being banned?

Because that is the only explanation I can think of.

Thanks!

 

jdMorgan




msg:4109439
 4:25 pm on Apr 3, 2010 (gmt 0)

Webalizer uses the raw server access logs as input, and therefore can't log anything that didn't happen.

Have look at the scope of your "Deny from" directives -- Are they enclosed in any containers such as <Files>, <FilesMatch>, or <Limit>? If so, a common problem is that those containers are "wrong."

In .htaccess, you may use only one "Order" directive within the same scope. Otherwise, only the last one found will apply. Also, it is a common problem that webmasters use the wrong "Order" -- I suggest using "Order Deny,Allow" and adjusting the Allow and Deny directives to suit. If "Allow from all" appears in your code, it's likely that you have this problem.

The root problem that causes so many sites to have errors is that webmasters copy bad code from forums, use it on their sites, and then re-post that code in other forums, and the bad code spreads because no-one ever bothers to check that code against the documentation at apache.org and to really understand it before using it.

Even common "control panels" often produce malformed, badly-scoped, and/or inefficient code. Code distributed with many forum, blog, and CMS packages is often sloppy and/or really inefficient as well. Therefore, it's quite fair to say that most sites have bad code on them... in many cases, *very* bad code.

Jim

Decius




msg:4109515
 6:57 pm on Apr 3, 2010 (gmt 0)

Hey Jim,

Thanks for the response!

I don't have any order clauses in the .htaccess file.

I just have a list of "deny from ***" clauses.

But as I said, I have tested it with my own IP and it works so, correct me if I'm wrong, but since most of your post addresses the htaccess file syntax, it doesn't provide further insight into the issue?

I've got a list of like 20 deny from clauses, and I've tried putting my ip at the top, at the bottom, and in the middle, and as soon as it is in there I get a forbidden page. As soon as i remove it, I can load the site.

So it appears to be working in top notch form.

Yet, webalizer still shows accesses from ips within that file.

Thanks.

tangor




msg:4109649
 5:01 am on Apr 4, 2010 (gmt 0)

Showing what? If 403 then doing it's job.

Decius




msg:4109655
 6:05 am on Apr 4, 2010 (gmt 0)

?

The problem, tangor, is that despite banning certain IPs, they are showing up using massive amounts of data in webalizer.

And as I've stated a few times now, the htaccess implementation has been thoroughly tested to be accurate.

tangor




msg:4109671
 8:53 am on Apr 4, 2010 (gmt 0)

Confusing me. I ban a number of ips via .htaccess. Still show up in my logs, as a 293byte 403 file (piddling content wise). That tells me the banned ip did NOT get content and, at the same time, tells me if they are still trying to get it. If you don't want them to show up in the logs at all you'll want to redirect the ip to a separate log file, but that another topic (which you can research here on WW).

Decius




msg:4109691
 10:28 am on Apr 4, 2010 (gmt 0)

Massive amounts of bandwidth != 293kbyte file.

One such IP, for example, downloaded approximately 120gigs in a day and a half.

tangor




msg:4109708
 12:03 pm on Apr 4, 2010 (gmt 0)

read that as less than 1K in size for the 403. To achieve 120gb that file would have to be returned 120,000,000,000 times. I don't see that kind of activity on any of my sites for a single file, much less in a day and a half. :)

If you have data going out then I suggest you look again over jdmorgan's suggestions. Obviously something is inaccurate. Might take a look at what your 403 file is actually returning!

tangor




msg:4109709
 12:13 pm on Apr 4, 2010 (gmt 0)

Here's the operative part of my .htaccess:

<Files .htaccess>
Order Deny,Allow
Deny from all
</Files>

<Limit PUT DELETE>
Deny from all
</Limit>

<Limit GET POST>
Order Deny,Allow
Deny from env=ban
Allow from all
</limit>

<Files *>
Deny from 174.129
Deny from env=ban
</Files>

jdMorgan




msg:4109719
 1:05 pm on Apr 4, 2010 (gmt 0)

tangor,

I don't want to divert this thread too far off track, but there are problems in your file. Some of your 'code blocks' have no "Order" specified within their scope, and there are several HTTP methods which aren't subject to any access controls at all. Further, if you attempt to use a custom 403 page, access to it will be blocked, resulting in an 'infinite loop' of 403 response attempts. I'd suggest:

SetEnvIf Request_URI "(robots\.txt|custom-403-page\.html)$" pass
#
Order Deny,Allow
#
<FilesMatch "\.(htaccess|htpasswd)$">
Deny from all
</FilesMatch>
#
<LimitExcept GET POST>
Deny from all
</LimitExcept>
#
<Limit GET POST>
Deny from 174.129
Deny from env=ban
Allow from env=pass
</Limit>

The SetEnvIf and "Allow from env=pass" directives create an override that allows all requestors to fetch robots.txt and your custom 403 error page. This will prevent problems with user-agents which interpret any failure to fetch robots.txt as carte-blanche to spider your site (likely resulting in a ton of 403s), and prevents the previously-mentioned 'infinite loop' on custom 403 page access.

All Denies are processed first, and Allows can override them. Any access not explicitly denied will be allowed. This is the most useful configuration, and makes the robots.txt and custom 403 page exclusions possible. Note that despite the added functionality the code is now simplified, with three 'blocks' of code instead of four. Note that as documented, "GET" includes "HEAD" in both <Limit> and <LimitExcept>, and therefore no explicit provisions need be made for HEAD requests.

Jim

[edited by: jdMorgan at 1:28 pm (utc) on Apr 4, 2010]

jdMorgan




msg:4109720
 1:08 pm on Apr 4, 2010 (gmt 0)

Decius,

I doubt we'll make any further progress on your problem until we can see your code as well. Please change any specific domain references to "example.com" and post (only) the relevant sections of your .htaccess file.

Jim

Decius




msg:4109858
 9:14 pm on Apr 4, 2010 (gmt 0)

Hey Jim,

Thanks for keeping with me on this.

Okay, so I checked my HTACCESS file, and since I coded this site a while ago, I didn't realize it was much longer than I had originally anticipated.

The "deny from" clauses were right at the top, and the rest of the URL rewrites were below it, far below it, so I didn't see it originally.

Here is the file with all the relevant lines (it is in order):

deny from *****
deny from *****
deny from *****
deny from *****

RewriteEngine On
RewriteBase /

RewriteCond %{REQUEST_URI} ^/errno.php3$ [NC]
RewriteCond %{THE_REQUEST} ^GET[[:space:]](.+)[[:space:]]HTTP/ [NC]

RewriteCond %{HTTP_HOST} ^domain\.com$
RewriteRule ^(.*)$ http://www.domain.com/$1 [L,R=301]

RewriteRule ^index\.php3$ http://www.domain.com/ [L,R=301]

RewriteRule blah blah
RewriteRule blah blah

RewriteRule ^([^/]+)\.htm$ $1.php3

RewriteRule blah blah
RewriteRule blah blah


Where there are more deny from clauses at the top and more rewriterules at the bottom. The rewriterules at the bottom are all of the same nature, where an SEO url is converted into a dynamic URL with variables.

I want to press that regardless of potential errors in this htaccess file, or improper use of syntax for other unrelated commands, that the deny from clauses at the top have been tested to operate in the main domain and sub-directories of the site.

tangor




msg:4109861
 9:30 pm on Apr 4, 2010 (gmt 0)

@jdMorgan: That did streamline things! Thanks! Never had problem with 403 as I do not use a custom 403... but your explanation has cleared things up a bit.

jdMorgan




msg:4109938
 2:05 am on Apr 5, 2010 (gmt 0)

BY not specifying an "Order" before starting the Deny/Allow list, you leave the function of your deny list "depending on the mercy" of the server configuration -- and who knows what is in that file... The server config may contain several Order directives, with each enclosed in a different container which makes it conditional upon some request-related factor (see the code I posted above for examples). What those factors might be in your server config files, we can only guess -- And that is the problem: Your Denies need to be applied unconditionally.

I would suggest that adding an "Order Deny,Allow" directive should be your first/next step.

Jim

Decius




msg:4110052
 8:46 am on Apr 5, 2010 (gmt 0)

Hey Jim,

Sounds good. Do I put "Order Deny,Allow" at the very top of the file then? Just, as is?

Much, much appreciated.

jdMorgan




msg:4110104
 1:05 pm on Apr 5, 2010 (gmt 0)

Put it ahead of your list of Denys/Allows.

See Apache mod_access documentation...

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved