homepage Welcome to WebmasterWorld Guest from 54.226.10.234
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Host upgraded to Litespeed, no longer recognises .htaccess ip bans
Could I use redirects instead?
whitenoise

5+ Year Member



 
Msg#: 4543916 posted 7:21 pm on Feb 8, 2013 (gmt 0)

I've found out that my host has 'upgraded' their server to using something called Litespeed. Apparently the syntax used for my ip bans that I had in place using Birdmans spider trap [webmasterworld.com] are not recognised by Litespeed. See below for an example of what I am using:

SetEnvIf Remote_Addr ^12\.345\.678\.90$ getout #8-02-2013, 10:50
SetEnvIf Request_URI "^(/403\.php|/robots\.txt)$" allowsome
<Files *>
order deny,allow
deny from env=getout
allow from env=allowsome
</Files>

This doesn't get recognised by Litespeed, and I've been advised to use the following instead:

order allow,deny
deny from 123.45.6.7
deny from 012.34.5.
allow from all

I don't see a way of automating the additional of new entries like the original Birdman script? I've seen a post somewhere else that mentioned converting the SetEnvIf's into redirects like below:

RewriteCond %{HTTP_REFERER} !^http://xx\.xx\.xx\.xx$
RewriteCond %{HTTP_REFERER} !^http://xx\.xx\.xx\.xx$
RewriteCond %{HTTP_REFERER} !^http://xx\.xx\.xx\.xx$
RewriteRule .* - [F]
Would this still do the same job, and would it be as efficient, especially if there were say 100 lines of these?

Any help would be appreciated! Thanks :)

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4543916 posted 9:39 pm on Feb 8, 2013 (gmt 0)

If you are on anything less than Apache 2.4-- which is said to recognize CIDR notation though I can't find the reference now-- then using mod_rewrite for IP blocks will be impossibly clunky and space-and-energy consuming.

Besides, you don't mean {HTTP_REFERER}. You mean {REMOTE_HOST}. (Er... don't you?) And all those items would have to be linked with [OR], since the default in RewriteCond is AND.

The formulation

deny from 123.45.6.7

using IP ranges is the best approach in most circumstances. Except of course that you wouldn't go by /32 slivers.

You only need mod_rewrite when you're setting up conditions, like "block so-and-so unless they're asking for such-and-such". Your example isn't a very good one, though; instead you should have something like

<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>

The form

<Files *>

is meaningless in any case, since it means-- or is intended to mean-- "all files". Just leave out the envelope.

Does your link point to a members-only forum? I got bounced to the front page :(

Oh and...
SetEnvIf Remote_Addr ^12\.345\.678\.90$ getout #8-02-2013, 10:50

Is the last bit a comment? My server gets snippy if I put a comment in the same physical line as an active statement. It has to be on a line of its own.

whitenoise

5+ Year Member



 
Msg#: 4543916 posted 1:32 pm on Feb 9, 2013 (gmt 0)

Thanks for the reply Lucy. That link should have been http://www.webmasterworld.com/forum88/4242.htm [webmasterworld.com], sorry about that.

Yes I thought the mod_rewrite way of doing things would be a bit too clunky. And yes I meant remote_host, sorry my mod_rewrite knowledge is a bit rusty.

That last bit of the SetEnvIf was a comment yes, and it seemed to work fine before.

The host recommended using the deny from 123.45.6.7 formulation, but I wasn't sure if I can automate the adding of new lines. Basically when someone trips the spider trap on my site, there is a piece of PHP code which automatically writes a line to the start .htaccess file, which bans people from the entire site. (More details explained in the link above)

With the old way it was easy, since it just appended a new line to the top of the file, and this worked great.

However if I am using the below example, I will need a way of adding new lines to that programmatically, which I'm not sure how to do at the moment. You mentioned it doesn't need the envelope, so can it simply be just the below?

order allow,deny
deny from 123.45.6.7
deny from 012.34.5.
allow from all

Sorry for the questions.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4543916 posted 11:08 pm on Feb 9, 2013 (gmt 0)

Looking at the code...

Urk. Looks like there's one insurmountable problem. Since the function works dynamically on incoming requests, it can only recognize exact IPs, meaning /32. Normally you would never do this in htaccess except in truly exceptional cases. (I could make something up, but there's no point: an exception means, by defintion, something exceptional.)

That means you can use it in the short term, but you will need to go into your htaccess periodically, sort it by lines, and change all those individual aaa.bbb.ccc.ddd blocks into whole ranges like aaa.bbb.ccc/19 or aaa.bbb/14. Your machine can't do that; it involves manual lookups. But if you count on your fingers* you will see that 2^32 possible blocks-- considering only IPv4 --can quickly balloon into a seriously unworkable htaccess.

You can compromise by using only the first three pieces: if the request comes from aa.bb.cc.dd write the code to look at aa.bb.cc alone. The chances of different behaviors coming from within the same /24 are generally too small to bother with. If you meet an exception, send 'em my way: they're probably in my target demographic.

There's basically just one line to change. Maybe a few surrounding lines to get the text added in the right place. (And, incidentally, the idea of letting anything other than me-the-human-user edit my htaccess makes my skin crawl. But obviously it does work.)

Current version using mod_setenvif

$content = "SetEnvIf Remote_Addr ^" . $bad_bot_ip . "$ getout\r\n";

(Do you have to say \r\n to make it work on all servers? Or is that just so you can read the file with your eyeballs on your local Windows machine? I just say \n.)

If you're doing this in mod_rewrite, the overall structure would be

RewriteCond %{REMOTE_HOST} ^bad-IP-here$
RewriteRule .* - [F]

RewriteCond %{REMOTE_HOST} ^other-IP-here$
RewriteRule .* - [F]

... except that in real life you would never do it like that. It would be either

RewriteCond %{REMOTE_HOST} ^bad-IP-here$ [OR]
RewriteCond %{REMOTE_HOST} ^other-IP-here$ [OR]
RewriteCond %{REMOTE_HOST} ^third-IP-here$
RewriteRule .* - [F]

or

RewriteCond %{REMOTE_HOST} ^(bad-IP-here|other-IP-here|third-IP-here)$
RewriteRule .* - [F]

At this point you get into php details. It is obviously easiest to code a fresh rule for each bad robot. But if they start piling up it will slow your site to a crawl. So more likely you'd want to pre-seed your htaccess with the starter rule, and then write the php to insert lines as needed. Most straightforward is probably to add lines like

RewriteCond %{REMOTE_HOST} ^fourth-bad-IP-here$ [OR]

before the existing

RewriteCond %{REMOTE_HOST} ^bad-IP-here$ [OR]

That was in mod_rewrite. If you go with mod_authz-whatever (there are dozens of them and nobody can possibly be expected to remember the exact name!) then it becomes a simpler

Deny from bad-IP-here

You definitely don't need the robots.txt exemption in its original form; in fact your htaccess should already have a

<Files "robots.txt">

et cetera. Anything inside a <Files> or <FilesMatch> overrides other rules if there's a conflict.


* Cursory business with calculator tells me you could come out with a mod_authz section running into the gigabytes :)

whitenoise

5+ Year Member



 
Msg#: 4543916 posted 7:56 pm on Feb 10, 2013 (gmt 0)

Thanks for your help Lucy, it is very much appreciated.

To update things, I've been in negotiation with my host and weighing up the options, I've agreed to move away from the shared hosting package they offer onto a VPS. This means that I will still be able to use the normal Apache services, and means a faster server and I will be able to use other features I haven't been able to.

This will mean I should be able to use what I was using before, with a few adjustments based on your input. So I think I will now use:

SetEnvIf Request_URI "^error403\.php$" allow
SetEnvIf Request_URI "^/robots\.txt$" allow

Order deny,allow
Deny from aa.bbb.ccc.ddd
Deny from aa.bbb.ccc.ddd
Allow from allow

This will simplify things a bit I think. I do try to manage the htaccess file to stop it from getting too big by now and then removing all the entries and starting again. If they then visit the site and do the same thing once more, they won't get very far as it will block them again, so I think this works well.

I could do what you have suggested and just use aa.bbb.ccc to further simply things, as like you said the chances of the last part being an issue is very small. Perhaps keep my contact form page visible, with a note saying they can contact me if they think there has been a mistake.

I'm not sure what you mean about the <Files "robots.txt"> in htaccess as I don't have an entry about this at the moment?

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4543916 posted 11:44 pm on Feb 10, 2013 (gmt 0)

Envelopes in <Files> or <FilesMatch> can be used both in htaccess and config files, and work the same way in both places. The server reads them after the other stuff, so <Files> rules can override or, uhm, overwrite anything you said outside of the envelope.

If you put a piece in your htaccess or config file (is it config for a VPS?) like this

<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>

then you don't need to say anything else about robots.txt. Even if the request comes from a blocked IP, the <Files... envelope will unblock it. So your visiting bad robot can't say "I wanted to obey robots.txt, honest I did, but they wouldn't let me see it!" ;)

Note that <Files... applies only to real, physical files. So if your robots.txt is created dynamically by some kind of php jiggery-pokery, <Files... won't work.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved