Forum Moderators: phranque

Message Too Old, No Replies

Help a .htaccess newbie

Can someone please read my .htaccess file?

         

Evil_Lama

5:04 am on Jun 8, 2010 (gmt 0)

10+ Year Member



Hello,

I am totally new to .htaccess and have no experience what so ever. I was browsing around looking for some info and landed on your site. In looking around I found some code (sorry I forgot who wrote it) that I would like to add, but am unsure how to do it re: spacing etc.

Here is my .htaccess code that I have pasted together after looking over the site. If some one could please kindly check it and let me know if it is formatted correctly that would be wonderful.

My goals for the .htaccess are twofold. One; suppress or hide the file so it is harder to access.

Two; stop those pesky bots. Would this do the trick?

The other question I had is where to I place the file? Does it go in my /public_html folder with my web site?

With respect to the first part of the code re: deny files – does this just suppress the file on it’s own or is there something else I need to implement server side to get it to work?

Regards,

Evil Lama

<Files .htaccess>
order allow,deny
deny from all
</Files>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]

wilderness

2:06 pm on Jun 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



First off, these lines have been copied time and again. Most of the User Agents are no longer in use. Fourteen (lines) of similar UA's may replaced with a solitary line.
Your not likely to find any volunteer here that will weed through that many lines of a file and make corrections.

My goals for the .htaccess are twofold. One; suppress or hide the file so it is harder to access.


are you referring to the actual htaccess file or other files?
On most web servers (at least secure ones) and by default, the htaccess cannot be viewed in the same manner that robots.txt may. Test it on a few sites:
http://www.example.com/.htaccess
or
http://www.example.com/htaccess

the htaccess file generally goes into your "root folder", unless your creating an additional htaccess (with different rules) for a sub-folder.

<Files .htaccess>
order allow,deny
deny from all
</Files>


Perahps Jim or another may advise you this implementation, personally, I don't see any benefit to using these lines when supporting lines (i. e., deny from or SetEnvIf) are not used.

Additionally the closing line (RewriteRule) may be changed as well to:

RewriteRule .* - [F]

Evil_Lama

6:42 pm on Jun 8, 2010 (gmt 0)

10+ Year Member



Hello,

Thanks for your reply. Hmmm that is too bad about most of this being out of date. Would you mind sharing the lone line? I would be grateful.

I guess one of my main questions was how .htaccess files are formatted. Are they one continuous block of code or is there some kind o spacing convention between the various instructions i.e. “/”.

The .htaccess file was placed there by the serve company (so I guess it is in the right place), and I have not moved it and am not adding another.

The server created .htaccess disallows hotlinking, but I wanted to modify it a bit to keep those nasty spiders away from my site. Unfortunately, I don’t know if the .htaccess file is viewable or not that is why I wanted to add that line to be sure that it was suppressed.

Thanks again for the feedback.

Evil_Lama

wilderness

7:07 pm on Jun 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]


The following lines replaces ALL the above:

# UA begins with Web
RewriteCond %{HTTP_USER_AGENT} ^Web [NC,OR]

You may modify the htaccess either through cPanel or using an offline text editor (I use WordPad because it allows the "no wrap" option for text files) and then uploading the file via ftp or http. (make sure to confirm that your website still functions after each upload/modification).

Generally speaking, the htaccess modifications (i. e., hotlinking and others) offered via CP are badly configured expressions.

Evil_Lama

5:11 am on Jun 9, 2010 (gmt 0)

10+ Year Member



Hello,

Thanks for getting back to me. I know the code generated by the site might not be that great, but at this point I want to keep it simple as I know nothing about this.

Here is how my code looks right now (I just changed the site address) to stop hotlinking.

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://mysite.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://mysite.com$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.mysite.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.mysite.com$ [NC]
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]

I don’t intend to fiddle with that part of the code. Simply because from what I understand making changes to the .htaccess file is complex and I want to take it slow and make sure it all works OK.

So what I would like to do is stop those evil bots from wrecking my site. So how would I add the code that you have suggested to the file? Is it just one giant block like:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://mysite.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://mysite.com$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.mysite.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.mysite.com$ [NC]
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]
RewriteCond %{HTTP_USER_AGENT} ^Web [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]

Or is there another line of code that I need to add or a space or what? Lastly how do I test it to make sure it works?

Please let me know. And thanks again.

Regards,

EL

Evil_Lama

5:27 am on Jun 9, 2010 (gmt 0)

10+ Year Member



By the way I use TextWrangler to write the code - er cut and paste.

EL

wilderness

6:06 am on Jun 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The following as an EXAMPLE and incomplete (from your previous lines)

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://(www\.)?mysite\.com [NC]
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]
RewriteCond %{HTTP_USER_AGENT} crawl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Download [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web [NC]

_________________
This single line:
RewriteCond %{HTTP_USER_AGENT} ^Web [NC]

Replaces the fourteen lines that I copied from your initial post.

Copying and pasting these types of lines from anothers use is a bad practice!
You should understand some simple techniques and procedures.
Failure to take the necessary time to understand these procedures and their effects could be devastating to your website (s), even stop your site from functioning at all and returning a 500 error for every site visitor.

Evil_Lama

6:54 pm on Jun 9, 2010 (gmt 0)

10+ Year Member



Hello Wilderness,

Thanks for getting back to me. I recognize that cutting and pasting is not the best plan. Unfortunately, I have no computer programming skills and no 0 about code. I would like to learn though.

Are there any good books out there to learn the basics of .htaccess?

After looking at this site I saw that people were sharing their coding, and was hoping that I may be able to get started by using someones code to prevent site scraping and just be able to test it out. The only other option for me is a program called Copy Defender [copydefender.com ]

I just ran across it yesterday. It uses a PHP code, to defend against site copying. Do you know anything about this? The other option is to leave my site with nothing except the no hotlinking – which is not a good idea.

I gather that there are no line breaks with .htaccess by what you have said and it should look like so:

RewriteEngine on 
RewriteCond %{HTTP_REFERER} !^http://mysite.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://mysite.com$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.mysite.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.mysite.com$ [NC]
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]
RewriteCond %{HTTP_USER_AGENT} ^Web [NC]


So how can I test it to be sure that it works and is not overly aggressive or turning my server into spam site – I am joking about the last part of course.

But seriously, how would I test it? Thanks again for all your help.

Regards,

Evil Lama

Evil_Lama

9:30 pm on Jun 9, 2010 (gmt 0)

10+ Year Member



Hello again,

Also what would this line of code do?

RewriteCond %{HTTP_USER_AGENT} ^Web [NC]

Regards,

Evil Lama

jdMorgan

10:01 pm on Jun 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Take a look at the resources cited in our Apache Forum Charter. mod_rewrite code is not at all a cut-and-paste proposition, and it would be a mistake to use code on your server that you do not fully understand and cannot modify with confidence. This is server configuration code, not some simple application script...

The regular-expressions pattern "^Web" in a RewriteCond with an [NC] flag matches any User-agent string which *starts with* "Web" -- and this matching is case-insensitive because of the [NoCase] flag.

Jim

wilderness

10:03 pm on Jun 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Also what would this line of code do?


This is previously explained in my 2nd reply to you of this same thread [webmasterworld.com].

Note the remark above the line which explains the function.

But seriously, how would I test it?


How what you test what?
1) If the htaccess is uploaded
2) if the htaccess is causing a syntax error
3) if your site is working
4) if any of the individual lines in the htaccess are working?

wilderness

10:11 pm on Jun 9, 2010 (gmt 0)

Evil_Lama

11:53 pm on Jun 9, 2010 (gmt 0)

10+ Year Member



Hello,

Thanks you two for your posts. Jim, thanks for explaining precisely what the code is looking for and how it is working. Perhaps my attempted cut and paste approach is just a bad idea – as I know nothing about this stuff.

After looking around the various sites talking about .htaccess code I thought it might be possible to set something up that is simple and will keep those nasty bots away. I lurked on your site for a bit before joining so, I know that .htaccess is not a toy and is extremely powerful code.

Wilderness: What I meant by how would I test it is: if I create a “sandbox” and upload the code how do I know for sure that it is turning away the bots, or if there is some kind of syntax error. Also, with respect to “what does it do?” what I wanted to know is what exactly will happen if someone tries to “scrape” my site? Will they just be turned away with “access denied” ? I guess I wanted to know exactly what a user would see if I implemented your code.

But I think at this stage I need to put this on the backburner and learn more about it, before delving into it. After a friend’s site was hacked I got a little concerned and was looking for a way to stop it.

If you guys have any ideas let me know. In the mean time I will have a look at your links.

Best regards,

Evil_Lama

wilderness

4:16 am on Jun 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



if I create a “sandbox” and upload the code how do I know for sure that it is turning away the bots, or if there is some kind of syntax error.


what exactly will happen if someone tries to “scrape” my site? Will they just be turned away with “access denied”


In both these instances, the possibility exists that your efforts towards htaccess (and your inquiry here) seems that your putting the "cart before the horse". That is that your attempting to implement actions against bots of which your NOT even sure are visiting your site (s)?

Determination of your website (s) visitors is accomplished by reviewing your "raw visitor logs". From those logs and each request made to your server is a reaction from your website server that offers a request code in the raw logs.
You need to begin with some comprehension over a reasonable amount of time (1-3 months) of your visotor logs and the traffic coming through your site (s).

jdMorgan

4:19 pm on Jun 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As for testing, take a look at the "Prefbar" and "User agent switcher" add-ons for Firefox and other Mozilla-based browsers. These add-ons allow you to change your own user-agent string to that of a "bad bot" so you can test your code.

The best way to get the "bad-bot" user agent strings is to copy them from your own raw server access logs, or from one of the many Web sites that lists robot user-agent strings.

Jim

Evil_Lama

5:51 am on Jun 11, 2010 (gmt 0)

10+ Year Member



Hello,

Thanks very much for all your helpful tips – I appreciate the testing tip Jim. Very good to know. I will watch the logs and see what happens.

Best regards,

Evil_Lama