homepage Welcome to WebmasterWorld Guest from 54.227.89.236
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
RewriteRule/Deny not working
arms




msg:4419247
 3:35 am on Feb 19, 2012 (gmt 0)

I'm trying to block some of those pesky spiders and bots but for the life of me I can't get this to work (variations of a theme tried below) bur they're still getting through, any help is much appreciated

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^downthemall [OR]
RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider* [NC] [OR]
RewriteCond %{HTTP_USER_AGENT} ^AhrefsBot* [NC] [OR]
RewriteCond %{HTTP_USER_AGENT} ^Jyxobot* [NC] [OR]
RewriteCond %{HTTP_USER_AGENT} ^discobot* [NC] [OR]
RewriteCond %{HTTP_USER_AGENT} ^Plukkie* [NC] [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ezooms* [NC]
RewriteRule ^.* - [F,L]

SetEnvIfNoCase User-Agent "^Baiduspider" bad_bot
SetEnvIfNoCase User-Agent "^AhrefsBot" bad_bot
SetEnvIfNoCase User-Agent "^Jyxobot" bad_bot
SetEnvIfNoCase User-Agent "^discobot" bad_bot
SetEnvIfNoCase User-Agent "^Plukkie" bad_bot
SetEnvIfNoCase User-Agent "^Ezooms" bad_bot
<limit get="" post="">
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</limit>

order allow,deny
deny from 94.253.
deny from 109.60.
deny from 72.14.164.
deny from 195.7.10.56
deny from 90.197.49.47
deny from 130.206.32.253
allow from all

 

Pfui




msg:4419256
 4:40 am on Feb 19, 2012 (gmt 0)

Your code may have other problems but for starters, try changing all of your --

[NC] [OR]

-- notations to:

[NC,OR]

lucy24




msg:4419272
 7:01 am on Feb 19, 2012 (gmt 0)

... and when you've done that, get rid of all your opening anchors:

SetEnvIfNoCase User-Agent "^Baiduspider"
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider*

You don't want to block UAs that begin with "Baiduspider". You want to block UAs that contain "Baiduspider". Right?

Oh, and what are all those asterisks for? That is: what are they intended to be for? What they really do is allow inputs in the form "Baiduspide" "Baiduspider" "Baiduspiderrrrr" et cetera... so long as it's the first item in the string. I kinda think that isn't what you had in mind.

The bad bots don't need to be inside a <Limit> condition. You want to lock them out all the time, don't you?

Next item: why are you saying everything twice?

RewriteCond %{HTTP_USER_AGENT} ^Baiduspider* (et cetera)
... leading to [F] using mod_rewrite

SetEnvIfNoCase User-Agent "^Baiduspider" bad_bot (et cetera)
... leading to Deny from using mod_setenvif in combination with core (or mod_access depending on how old your installation is).

You only need one or the other. My personal preference: use the Environment version if there's a nice short distinctive piece of the UA that works all the time without any further conditions or exceptions, like "Clipish" or "HTTrack". No ifs, ands or buts: they're out.

If it's complicated-- "Block this UA if the string doesn't also contain this other word, or if it isn't from this IP"-- go to mod_rewrite.

Blocking by IP (Deny from 1.202.0.0/15) is cleanest and simplest of all. Robots can change their clothes (UA) or lie about who sent them (Referer), but the IP can't be faked.

wilderness




msg:4419287
 10:08 am on Feb 19, 2012 (gmt 0)

You should change the "order allow,deny"
to
"order deny,allow"

at least if you ever intend to use custom error pages.

arms




msg:4419399
 9:46 pm on Feb 19, 2012 (gmt 0)

Thanks for the replies I have changed to this:

SetEnvIfNoCase User-Agent "Baiduspider" bad_bot
SetEnvIfNoCase User-Agent "AhrefsBot" bad_bot
SetEnvIfNoCase User-Agent "Jyxobot" bad_bot
SetEnvIfNoCase User-Agent "discobot" bad_bot
SetEnvIfNoCase User-Agent "Plukkie" bad_bot
SetEnvIfNoCase User-Agent "Ezooms" bad_bot
SetEnvIfNoCase User-Agent "Exabot" bad_bot

<Files *>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Files>

order allow,deny
deny from 94.253.
deny from 109.60.
deny from 72.14.164.
deny from 66.219.58.
deny from 180.76.5.
deny from 123.125.71.
deny from 200.98.132.
deny from 62.212.69.
deny from 69.58.178.
deny from 195.7.10.56
deny from 90.197.49.47
deny from 176.9.51.
deny from 130.206.32.253
deny from 75.125.135.226
deny from 109.149.199.2
deny from 212.113.35.162
deny from 80.40.134.103
deny from 80.40.134.104
deny from 80.40.134.120
deny from 62.24.181.134
deny from 62.24.181.135
deny from 62.24.222.131
deny from 62.24.222.132
deny from 62.24.252.133
allow from all

but the little sods are still coming through

I took my lead from here: [webmasterworld.com...] but just can't seem to get it to work

g1smd




msg:4419403
 10:07 pm on Feb 19, 2012 (gmt 0)

Only one deny takes effect, the last one in the list.

Check the syntax, especially the "setting of an environmental variable" method.

arms




msg:4419406
 11:13 pm on Feb 19, 2012 (gmt 0)

OK so I change to:

SetEnvIfNoCase User-Agent "Baiduspider" bad_bot
SetEnvIfNoCase User-Agent "AhrefsBot" bad_bot
SetEnvIfNoCase User-Agent "Jyxobot" bad_bot
SetEnvIfNoCase User-Agent "discobot" bad_bot
SetEnvIfNoCase User-Agent "Plukkie" bad_bot
SetEnvIfNoCase User-Agent "Ezooms" bad_bot
SetEnvIfNoCase User-Agent "Exabot" bad_bot

<Files *>
Order Allow,Deny
Deny from env=bad_bot
deny from 94.253.
deny from 109.60.
deny from 72.14.164.
deny from 66.219.58.
deny from 180.76.5.
deny from 123.125.71.
deny from 200.98.132.
deny from 62.212.69.
deny from 69.58.178.
deny from 195.7.10.56
deny from 90.197.49.47
deny from 176.9.51.
deny from 130.206.32.253
deny from 75.125.135.226
deny from 109.149.199.2
deny from 212.113.35.162
deny from 80.40.134.103
deny from 80.40.134.104
deny from 80.40.134.120
deny from 62.24.181.134
deny from 62.24.181.135
deny from 62.24.222.131
deny from 62.24.222.132
deny from 62.24.252.133
allow from all
</Files>

No change

"Check the syntax, especially the "setting of an environmental variable" method." means nothing to me, I have exactly the same syntax as every example I have seen elsewhere including on this board

Pfui




msg:4419421
 2:07 am on Feb 20, 2012 (gmt 0)

Did you correct your [NC,OR] notations? Or did you opt to stop using mod_rewrite?

arms




msg:4419424
 2:20 am on Feb 20, 2012 (gmt 0)

Got rid of them my previous post is my total .htaccess

lucy24




msg:4419427
 2:40 am on Feb 20, 2012 (gmt 0)

Only one deny takes effect, the last one in the list.

OK, what glaringly obvious thing am I overlooking?

Incidentally, I put my environmental deny in the same place as the IP denys:

Deny from env=keep_out
Deny from 31.214.128.0/17
Deny from 38.100

... et cetera, et cetera.

Does <Files *> mean anything? I would have thought it's the same as not using an envelope at all.

Oh, and I just say BrowserMatch. Saves several bytes ;)

g1smd




msg:4419667
 7:17 pm on Feb 20, 2012 (gmt 0)

There should only be one deny statement.

If you have multiple deny statements, all previous deny statements are ignored and only what is in the last deny statement will ever apply.

lucy24




msg:4419688
 8:30 pm on Feb 20, 2012 (gmt 0)

If you have multiple deny statements, all previous deny statements are ignored and only what is in the last deny statement will ever apply.

Still missing something, because that is precisely how my own htaccess is set up, and it definitely Denies from everyone on the list:

Order Allow,Deny
Allow from all

Deny from env=keep_out

Deny from 31.214.128.0/17
....
and so on down to
Deny from 223.198.0.0/15

which is definitely not the only IP to get locked out. I'd have noticed.

Allow,Deny
First, all Allow directives are evaluated; at least one must match, or the request is rejected. Next, all Deny directives are evaluated. If any matches, the request is rejected. Last, any requests which do not match an Allow or a Deny directive are denied by default.

Emphasis mine. Are we talking about different things?

arms




msg:4419717
 10:18 pm on Feb 20, 2012 (gmt 0)

This is where I am now, still with no success

SetEnvIfNoCase User-Agent "Baiduspider" bad_bot
SetEnvIfNoCase User-Agent "AhrefsBot" bad_bot
SetEnvIfNoCase User-Agent "Jyxobot" bad_bot
SetEnvIfNoCase User-Agent "discobot" bad_bot
SetEnvIfNoCase User-Agent "Plukkie" bad_bot
SetEnvIfNoCase User-Agent "Ezooms" bad_bot
SetEnvIfNoCase User-Agent "Exabot" bad_bot

Order allow,deny
deny from env=bad_bot
deny from 94.253.
deny from 109.60.
deny from 72.14.164.
deny from 66.219.58.
deny from 180.76.5.
deny from 123.125.71.
deny from 200.98.132.
deny from 62.212.69.
deny from 69.58.178.
deny from 195.7.10.56
deny from 90.197.49.47
deny from 176.9.51.
deny from 130.206.32.253
deny from 75.125.135.226
deny from 109.149.199.2
deny from 212.113.35.162
deny from 80.40.134.103
deny from 80.40.134.104
deny from 80.40.134.120
deny from 62.24.181.134
deny from 62.24.181.135
deny from 62.24.222.131
deny from 62.24.222.132
deny from 62.24.252.133
allow from all

wilderness




msg:4419856
 4:20 am on Feb 21, 2012 (gmt 0)

First and foremost, this thread belongs in the SSID Forum </rant>

This is where I am now, still with no success


What exactly is NOT working?
What isn't working as you intended?
Have you checked your error logs?

1) Before you begin with htaccess perhaps you should improve both your copying and pasting skills, and interpretations skills?
2) Your environment variable does not even match the thread you quoted with a link.
a) your code:
Order allow,deny
deny from env=bad_bot
deny from 94.253.
allow from all

b) your link code:
SetEnvIfNoCase User-Agent "^Zyborg" bad_bot
<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>


The Apache mod_access page [httpd.apache.org] provides the following example partially don the page and in the Deny section:
Order Deny,Allow
Deny from all
Allow from apache.org



The code you used does not match either your cited link or the Apache example.

1) Your are not consistent with you use of upper and lower case. Use or one method or other, however don't mix both.
EX:
Deny from
deny from
Some new server may prove quite picky.

2) start with a small file and get it fucntioning and then go back and add more UA's and IP's.
EX (although and as I provided previously you should be using the variable
Order Deny,Allow:

<Limit>
SetEnvIfNoCase User-Agent Baiduspider bad_bot
Order Allow, Deny
Deny from 94.253. (or an IP the may verify easily
Allow from all
Deny from env=bad_bot
</Limit>

There have been some recent threads in this forum and the SSID which express the issues created in raw logs when using the "Files" container.

I also expressed (recently) similar issues in the SSID forum when using quotes with SetEnvIf, thus I suggest you stop using the quotes on every line and use them quite sparingly and when you do progress with your skills enough to benefit from "exactly as".



lucy24




msg:4419873
 4:54 am on Feb 21, 2012 (gmt 0)

Edit: Oops. Not sure why my tiny little post took over half an hour to put together, but yup, I'm overlapping.

Necessary backtracking: Is it your own server or shared? If shared, are you allowed to have fully functional htaccess files? ("Allowed" here means "it will work".)

What happens if you add your own IP to the "deny from" list? Do you just cruise on in as if nothing had happened?

Any change if you put it into Title Case ("Allow,Deny")? Most Apache installations don't care, but occasionally one does.

I was apprehensive about the trailing . in some of the partial IPs, but checked it and it doesn't seem to make any difference.

wilderness




msg:4419968
 2:05 pm on Feb 21, 2012 (gmt 0)

I was apprehensive about the trailing . in some of the partial IPs, but checked it and it doesn't seem to make any difference.


There's some very old discussion on this, however I've no clue what to search that would result in the threads.

I use the trailins DOT in everything for "deny from IP's".
The early logic was similar to a trailing slash in robots text (include all).
I seem to recall some early access errors (2000 or 2001) when omitting the DOT.
Apache and the change from POSIX to PCRE has an effect, I'm sure.
In any event the logic is that it works either way in present day, or so Jim stressed many times.

g1smd




msg:4420122
 7:48 pm on Feb 21, 2012 (gmt 0)

Are we talking about different things?

Uh, yeah. I misremembered some detail, and didn't take time out to check the facts.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved