Forum Moderators: phranque

Message Too Old, No Replies

Blocking Bots and error logs

         

yaashul

4:42 am on Oct 1, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



I am blocking bots with the method I read on the lucy24's post

BrowserMatchNoCase "libwww" banned
BrowserMatchNoCase "Wget" banned
BrowserMatchNoCase "LWP" banned
Order Deny,Allow
Deny from env=banned

But that method is creating lot of error logs and I am not able to read actual error logs bcoz of that. Is there any way I can block error_log if I block a certain bot(bots).

not2easy

5:12 am on Oct 1, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



There is specific syntax required for the use of the code you are using, if you are using it as shown here, I don't think it is actually doing what you want it to do. For one thing, the "Deny, Allow" lacks any allow and may be blocking a lot more than robots.

I would take a little time to learn to use the correct formats and avoid server errors. There is a difference in some syntax between Apache 2.2 and 2.4 so be sure what you want to use is correct for the version of Apache for your site. Visit the Apache Documentation pages for the right way to implement these blocks. The Docs for v. 2.4 are here: [httpd.apache.org...]

yaashul

5:34 am on Oct 1, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Not2easy,

I am using apache 2.2.x and I am not sure how should I implement it. I am able to block bots but they r filling my error_log with client denied errors.

lucy24

6:11 am on Oct 1, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But that method is creating lot of error logs and I am not able to read actual error logs bcoz of that.

If you're on shared hosting, this is an insoluble problem. Log levels can only be changed in the config file. And even then, you will almost certainly find there is no one level that shows only the information you want and none of the information you don't want.

Low-tech solution: Open your error log file in a text editor. Globally delete all lines containing the text "Client denied by server configuration"
.+Client denied by server configuration.+\n

(the locution Apache uses for all 403s, regardless of source).

What's left will be all other errors.

not2easy

6:20 am on Oct 1, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



That's because it looks like you are denying everybody.
When you use Deny, Allow and specify to deny from env=banned and then don't allow from any, it may be banning all visitors.

If you visit the Apache 2.4 Docs, you will find links to the 2.2 versions there. If you examine current information in the Library here: [webmasterworld.com...] or Read these discussions: [httpd.apache.org...] you will see the correct formats for Apache 2.2

Lucy24 gives very good information, but when you copy and paste snippets used in examples, it can leave out some important stuff, it looks like that is what has happened.

yaashul

6:58 am on Oct 1, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Not2easy,

is it the right way to go forward?

<IfModule mod_setenvif.c>
BrowserMatchNoCase "libwww" banned
BrowserMatchNoCase "Wget" banned
BrowserMatchNoCase "LWP" banned
<Limit GET POST PUT>
Order Allow,Deny
Allow from all
Deny from env=banned
</Limit>
</IfModule>

not2easy

1:35 pm on Oct 1, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I would do without the <IfModule mod_setenvif.c> envelope, because if your server setup makes it available you don't need that and if it is not available, that won't add it.

You can remove the " so that
BrowserMatchNoCase "libwww" banned becomes
BrowserMatchNoCase libwww banned

I don't use this particular method, but offhand I think the " quotes are used when you add more complex UA's.

Another way to look up specific instances here is to use the forum's internal search. Just be wary of using practices from 2004 that may no longer be valid. If in doubt, the Apache Docs are where the facts are.

lucy24

11:51 pm on Oct 1, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think the " quotes are used when you add more complex UA's

In mod_setenvif-- unlike mod_rewrite-- quotation marks allow you to include literal spaces within the UA string. For example
BrowserMatch "Bork-edition \[en\]" keep_out

If you didn't have the quotation marks, it would think you're setting two environmental variables, one called \[en\] and a second one called keep_out :) Note that even within quotation marks, other Regular Expressions still apply.

<Limit GET POST PUT>

There's no particular reason for the LIMIT directive here. Suppose a robot comes along with some rare proprietary request method that you've never heard of? You'd still want to block them. GET POST PUT are the vanilla methods anyway. More likely you'd set special rules such as broader permissions for HEAD, or extra exclusions for POST. Unless you have the world's worst host, it shouldn't be necessary to say anything about PUT, because they're already blocked. (This is not done out of the kindness of the host's heart, but because they need to protect the server. All of it, including the parts belonging to users who wouldn't know an htaccess file if it bit them.)

yaashul

1:58 am on Oct 2, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks Lucy24, not2easy for your insights

incrediBILL

6:30 pm on Oct 2, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Another simple way to process your error log is using grep.

You can show everything but those errors using inverse grep such as:

grep "text to ignore" -v error.log

If you need it in a file, obviously pipe the results to a new file