Welcome to WebmasterWorld Guest from 54.81.220.239

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

.htaccess don't prevent htm/html files?

     
3:41 am on Dec 2, 2003 (gmt 0)

New User

10+ Year Member

joined:Oct 10, 2003
posts:27
votes: 0


I thought the below .htaccess was supposed to prevent ALL files from being grabbed by a particular website stealing application, in this case WebStripper?

However I just tested, it does not prevent HTML/HTM files from being grabbed - And why not?

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
RewriteRule ^.* - [F,L]

4:39 am on Dec 2, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


DKDiveDude,

Take the [OR] off - You never use [OR] on the final RewriteCond -- the one just ahead of RewriteRule -- because there is no other (next) condition to OR it with.

Jim

12:38 pm on Dec 2, 2003 (gmt 0)

New User

10+ Year Member

joined:Oct 10, 2003
posts:27
votes: 0


Hi JD,

actually the above was just part of my .htaccess, I did have several website grabbers listed and the last did NOT have the OR.

But I also had another rewrite rule before that, so let me do a more complete example:

RewriteEngine on

RewriteCond %{HTTP_REFERER} ^$ [OR]
RewriteCond %{HTTP_REFERER}!^http://(www\.)?MyWebsite\.com [NC]
RewriteRule \.(css¦jpg¦js¦mpg¦mov¦wmv)$ - [F,NC]

RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC]
RewriteRule ^.* - [F,L]

This above example does NOT seem to stop website snatchers from taking my HTM/HTML files, why not?

2:53 pm on Dec 2, 2003 (gmt 0)

New User

10+ Year Member

joined:Oct 10, 2003
posts:27
votes: 0


Another thing, would it be better to do include unwanted site snatcher names on one ReWriteCond, like this:

RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla¦WebStripper [NC]
RewriteRule ^.* - [F,L]

instead of this:

RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC]
RewriteRule ^.* - [F,L]

Which method is faster?

6:59 pm on Dec 2, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


DKDiveDude,

I don't see anything wrong with what you've got, other than a couple of 'style' issues that won't stop the code from working. Specifically, using a start anchor "^" with ".*" is not necessary, and [L] used with [F] is redundant. But neither of those is critical.

We have recently had one report of a 'weird non-printable character' somehow getting into some .htaccess code and messing things up, so you can cut and paste the following if you want to try that possibility.


RewriteEngine on

RewriteCond %{HTTP_REFERER} ^$ [OR]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?MyWebsite\.com [NC]
RewriteRule \.(css¦jpg¦js¦mpg¦mov¦wmv)$ - [F,NC]

RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC]
RewriteRule .* - [F]

In a study [webmasterworld.com] posted here last year by WebmasterWorld member andreas_friedrich, it was found that the (agent1¦agent2) method was slightly faster in an .htaccess context, as would be expected because of the reduced overhead to parse the RewriteCond directive itself. However, it's sometimes better to let your code 'mature' before adopting this method, since it is a lot easier to edit single-entry RewriteConds than the ones that are all combined. This is a trade-off between performance and maintainability. In an httpd.conf context, the separate RewriteConds are faster, because httpd.conf code is compiled at Apache start-up, and the 'internal coding' favors the one-at-a-time approach.

Also, when cutting and pasting code from WebmasterWorld that uses the 'locally-ORed' method, remember to replace the broken pipe "¦" character as seen here with a solid pipe character from your keyboard,

Short of the 'weird non-printing character' problem, this case is puzzling because your code should work. In that light, I guess I should ask - Have you confirmed the first set of rules is working to stop image hot-linking? In other words, are we looking at a case where the first rule set (for blocking image hotlinkers) works, but the second one (for blocking user agents) does not?

Jim

8:10 pm on Dec 2, 2003 (gmt 0)

New User

10+ Year Member

joined:Oct 10, 2003
posts:27
votes: 0


JD, thanks for your reply.

Yes the first rule works fine.

The second rule works somewhat, that is it DOES prevent everything EXCEPT HTM/HTML files from being snatched.

Which is not bad, however my sites has about 6000 files, and bandwidth stealing by these site snatchers is a big problem.

Have you heard of any legal cases against the developers of these programs?

I would be part of a class-action lawsuit as soon as I know of one.

The use of these site snatching programs, is beyond regular use, because it grabs everything, even files you probably wont every read/use.

10:09 pm on Dec 2, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


> The second rule works somewhat, that is it DOES prevent everything EXCEPT HTM/HTML files from being snatched.

There's no reason, based on the code shown above, that html or htm files should be excluded from the rule. If there are any other rules ahead of these, however, they could cause this rule to be bypassed. For example, if you had a rewrite rule ahead of this that changed all requests for htm or html files to requests for php files (just as an example), and was written like this:


RewriteRule (.*)\.html? /$1.php [L]

this would match all htm and html file requests, change them to php file requests, and the [L] flag woud stop any more rewriterules from being processed.

If it's not something you are doing, then ask your host if there is anything in the httpd.conf server config file that would do it.

Jim