Forum Moderators: phranque

Message Too Old, No Replies

.htaccess help needed

         

markl

5:02 pm on Apr 8, 2010 (gmt 0)

10+ Year Member



I have a basic understanding of .htaccess and know how to write code, however after trying for about 8+ hours and crawling the internet for help, I have some code that should work but just doesn't. I really wanted to try and solve this myself, but I just can't find whats wrong with it.

The code is designed to work with an installation of YOURLS - a URL shortner - but the basic code allows bots to access the site and thus skew the statistics. I want to be able to single these out.

-----

The code should check if the User-agent matches a list of bots/ IP addresses I've defined and send them to a different file than people whose address doesn't match.

Options +FollowSymlinks
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REMOTE_ADDR} ^xx\.xx\.xx\.xx [OR]
RewriteCond %{REMOTE_ADDR} ^xx\.xx\.xx\.#*$! [OR]
RewriteCond %{REMOTE_ADDR} ^xx\.xx\.xx\.xx [OR]
RewriteCond %{REMOTE_ADDR} ^xx\.#*$!\.#*$!\.xx [OR]
RewriteCond %{HTTP_USER_AGENT} ^Lynxy [OR]
RewriteCond %{HTTP_USER_AGENT} ^Voyager [OR]
RewriteCond %{HTTP_USER_AGENT} ^MetaURI API [OR]
RewriteCond %{HTTP_USER_AGENT} ^JS-Kit URL Resolver [OR]
RewriteCond %{HTTP_USER_AGENT} ^justsignal [OR]
RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [OR]
RewriteCond %{HTTP_USER_AGENT} ^radian6_linkcheck [OR]
RewriteCond %{HTTP_USER_AGENT} ^Twitterbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^PycURL [OR]
RewriteCond %{HTTP_USER_AGENT} ^abby [OR]
RewriteCond %{HTTP_USER_AGENT} ^Butterfly IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^Googlebot [OR]
RewriteCond %{HTTP_USER_AGENT} ^mxbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Feedtrace-bot [OR]
RewriteCond %{HTTP_USER_AGENT} ^uriplay [OR]
RewriteCond %{HTTP_USER_AGENT} ^VideoSurf_bot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Twitturls [OR]
RewriteCond %{HTTP_USER_AGENT} ^TweetmemeBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^NjuiceBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^OneRiot [OR]
RewriteCond %{HTTP_USER_AGENT} ^happyhourpress [OR]
RewriteCond %{HTTP_USER_AGENT} ^Googlebot-Mobile [OR]
RewriteCond %{HTTP_USER_AGENT} ^R6_FeedFetcher [OR]
RewriteCond %{HTTP_USER_AGENT} ^ThingFetcher [OR]
RewriteCond %{HTTP_USER_AGENT} ^Twitturly [OR]
RewriteCond %{HTTP_USER_AGENT} ^buzzzy/.com
RewriteRule ^([0-9A-Za-z]+)/?$ /yourls-bot.php?id=$1 [L]


# BEGIN YOURLS
RewriteRule ^([0-9A-Za-z]+)/?$ /yourls-go.php?id=$1 [L]
RewriteRule ^([0-9A-Za-z]+)\+/?$ /yourls-infos.php?id=$1 [L]
RewriteRule ^([0-9A-Za-z]+)\+all/?$ /yourls-infos.php?id=$1&all=1 [L]
# END YOURLS


Any help would be appreciated.
(P.S. I've anonymised the IP addresses in this version.)

g1smd

12:08 am on Apr 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Did you flush the browser cache before each test?


In what way did it fail?


Escape the spaces in your patterns.

Note that [A-Z0-9] with the [NC] flag is faster to compare.

There's a /. that should be \. in one pattern.

[edited by: jdMorgan at 5:18 am (utc) on Apr 10, 2010]
[edit reason] Edited by member request. [/edit]

markl

12:54 pm on Apr 9, 2010 (gmt 0)

10+ Year Member



Thanks g1smd,

I have been flushing the cache between (most) tests. (I hadn't thought that could be a factor at first but do it all the time now.)

The site keeps returning a '500 Internal Server Error'.

And, am I correct in saying that to escape the patterns, I should use a '+' instead of a space?

Plus, with your note about [A-Z0-9] being faster to compare, do you mean I should switch the '0-9', 'a-z' & 'A-Z' round in my example so it would look like:

RewriteRule ^([A-Za-z0-9]+)/?$ /yourls-go.php?id=$1 [L]


Thanks for your help so far,
Mark

g1smd

6:01 pm on Apr 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Replace it with [A-Z0-9] and the [NC] flag.

It will be faster to compare.

markl

10:44 pm on Apr 9, 2010 (gmt 0)

10+ Year Member



g1smd, thank you very much for your help - it all works perfectly! Much appreciated.

jdMorgan

1:10 pm on Apr 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your code will execute *much* faster if you move the file- and directory- exists checks to the end of the excluded-user-agent RewriteConds. There is no reason to go and read the disk unless one of the user-agents matches. In fact, I don't know why you'd care whether the URL resolves to an existing file or directory if the user-agent is unwelcome.

Escaping spaces:
Wrong: RewriteCond %{HTTP_USER_AGENT} ^JS-Kit URL Resolver [OR] (500-Server Error: Bad Flags Delimiter)
Right: RewriteCond %{HTTP_USER_AGENT} ^JS-Kit\ URL\ Resolver [OR]

Jim