Forum Moderators: phranque

Message Too Old, No Replies

Is there anything wrong with this htaccess?

From a htaccess n00b

         

oddsod

5:55 pm on Nov 23, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know there's tons of really useful htaccess threads and if I spend a few hours I can learn what works and what doesn't work, but I was hoping someone could have a quick look at this htaccess file from a site I bought recently and tell me if there's anything I need to change.

It's exactly as below (with just the domain name removed)

==============
Options +Includes

Redirect 301 /oldfolder1/ [mysite.com...]
Redirect 301 /oldfolder1 [mysite.com...]
Redirect 301 /oldfolder2/ [mysite.com...]
Redirect 301 /oldfolder2 [mysite.com...]
Redirect 301 /search [mysite.com...]

RewriteEngine On
RewriteCond %{HTTP_HOST} ^mysite.com
RewriteRule (.*) [mysite.com...] [R=301,L]

RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^BaiduSpider [NC, OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^YandexBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]
==========================

I need the htaccess to
1. redirect some pages/folders to new pages and folders (I think I've got this bit right, it's all working)
2. redirect all non-www requests to www (this is also working)
3. Block bots especially Baidu and Yandex (Baidu and Yandex don't seem to be getting blocked)

Thanks in advance for any help.

Any help much appreciated.

oddsod

8:47 pm on Nov 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thinking about it, it may have been the [OR] but I tried removing and adding the options line a couple of times to test it and, yes, that one change is triggering the problem.

Thanks for the tips on the difference in meaning with and without parentheses. But from all the descriptions above I thought I had my code right. What's wrong in the htaccess provided above that's still allowing the blocked spiders?

Do I need to change this
RewriteCond %{HTTP_USER_AGENT} (BlackWidow|Baiduspider|ExtractorPro|EyeNetIE|FlashGet|Hatena|JikeSpider|VoilaBot|YodaoBot) [NC,OR]


for this
RewriteCond %{HTTP_USER_AGENT} (BlackWidow|Baiduspider|ExtractorPro|EyeNetIE|FlashGet|Hatena|JikeSpider|VoilaBot|YodaoBot)/$ [NC,OR]

i.e. add the /$

I don't think I'll ever "get a grip" on Regular Expressions :(

wilderness

8:48 pm on Nov 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



OK, one more question. I have a vBulletin forum on this site and it's in a /forum/ folder. There is a htaccess file in this forum folder inserted by vBulletin. Do I need to put any specific commands in there? I didn't think so. But I still see those banned spiders getting files from the forum.

If blocked spiders are accessing this folder does it mean that my spider blocking isn't working?


I know nothing of vBulletin, however it's irrelevant and I'll explain why.
Does the vBulletin run in PHP or CGI?

It's damned likely that the PHP or CGI override the htaccess (I know that PHP does).

For some months I administered a website with a SMF Forum.
The SMF Forum was initially created in a sub-folder.
That SMF Forum and sub-folder had it's own htaccess (child of the root htaccess).
My first instinct was to add denials in the sub-folder htaccess, however that failed, and additions were required at the root (parent) and to effect the entire site.

FWIW, the vBulletin administrators panel may offer and option for banning visitors (SMF does, although it took some getting used to).

2nd-FWIW, not sure about the vBulletin-inserted-htaccess-lines, however the Wordpres-inserted-htaccess-lines are OUTDATED and include bad syntax (which g1smd has explained many times in this forum). I would suggest comparing the WordPress bad syntax with the vBulletin syntax and see what appears.

wilderness

8:53 pm on Nov 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't think I'll ever "get a grip" on Regular Expressions


;)

FWIW these UA Rewrites and the anchors are the simpliest of the simpliest, so much in fact, that I personally don't even consider them regex

Go over to the SSID forum (these things were NEVER intended for the Apache forum.

1) Create a new thread BaiduSpider (the forum is moderated and the new thread will require approval).

2) Then copy and paste a line that allows access from your visitor logs.

3) You might also explore your error logs and see if there's anything in there.

wilderness

9:39 pm on Nov 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No, dollar sign (ends with anchor) at the end of the line O provided.

The absence of either the "begins with character" or the "ends with character, simply means "contains", and that the word may be located anyplace in the UA.

lucy24

2:28 am on Nov 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



wilderness, are you sure about that first comma? It confuses the ### out of me.

I don't think I'll ever "get a grip" on Regular Expressions


Yes, you will. I started out being deathly afraid of them-- and that was in a text editor, where the worst that can happen is it goes into perpetual motion, or replaces all occurrences of the letter "e" with a carefully crafted piece of html. (Unlimited Undo Is Your Friend.) It took me a year just to progress from using Regular Expressions to find text, to using them to replace text.

It similarly took me a long time to get to where I don't have to take a quick look at my logs about 5 minutes after each .htaccess modification to make sure the Error Logs aren't suddenly ballooning like mad and carrying the same timestamp as the Access Logs.

And no, I didn't mean that your own rule was supposed to end in an anchor. In fact the UA-excluding rule would fail if it did have an anchor. Not 500-type spectacular failure, just fail to intercept anything it was supposed to stop cold.

wilderness

4:05 am on Nov 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



wilderness, are you sure about that first comma? It confuses the ### out of me.


lucy,
Perhaps I missed something, what comma?

lucy24

4:21 am on Nov 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I thought you were answering oddsod's question about adding /$. There's a heck of a difference between "No, dollar sign" and "No dollar sign" ;)

wilderness

6:52 am on Nov 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I thought you were answering oddsod's question about adding /$. There's a heck of a difference between "No, dollar sign" and "No dollar sign"


I was, unfortunately since I was idiotic enough to make a punctuation, error, I may assume that the other explanations are pure nonsense ;)
This 38 message thread spans 2 pages: 38