Forum Moderators: phranque

Message Too Old, No Replies

blocking refering sites & trbl shooting code (my first .htaccess file)

site reffer

         

shultz

5:30 pm on Jul 20, 2011 (gmt 0)

10+ Year Member



Hello,

I just finished reading alot on the net about apache server. But I'm still having problems with my .htaccess file. I noticed that what I thought was supposed to block refer sites is
    not.
And it's driving me crazy.

Also I would really be greatful if someone could look over my code.

Order Deny,Allow
Deny from #*$!.#*$!.#*$!.#*$!
Deny from #*$!.#*$!.#*$!.#*$!
deny from #*$!.#*$!.#*$!.#*$!
<FilesMatch "^410[^.]*\.shtml$">
Allow from all
</FilesMatch>
Options All -Indexes
RewriteEngine on
RewriteCond %{HTTP:VIA} !^$ [NC,OR]
RewriteCond %{HTTP:FORWARDED} !^$ [NC,OR]
RewriteCond %{HTTP:USERAGENT_VIA} !^$ [NC,OR]
RewriteCond %{HTTP:X_FORWARDED_FOR} !^$ [NC,OR]
RewriteCond %{HTTP:PROXY_CONNECTION} !^$ [NC,OR]
RewriteCond %{HTTP:XPROXY_CONNECTION} !^$ [NC,OR]
RewriteCond %{HTTP:HTTP_PC_REMOTE_ADDR} !^$ [NC,OR]
RewriteCond %{HTTP:HTTP_CLIENT_IP} !^$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^#*$!#*$!xx [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^#*$!x\ Toolbar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^#*$!xxToolbar\ Toolbar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^#*$!x\ Toolbar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^#*$!\ Toolbar [NC,OR]
RewriteCond %{http_user_agent} ^#*$!x\ toolbar [NC,OR]
RewriteCond %{HTTP_REFERER} #*$!#*$!\.com [NC,OR]
RewriteCond %{HTTP_REFERER} #*$!#*$!x\.com [NC,OR]
RewriteCond %{HTTP_REFERER} #*$!#*$!x\.org [NC]
RewriteRule !410\.shtml$ - [G]
# Block image hotlinking
RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://(www\.)?#*$!xx\.#*$! [NC]
RewriteRule \.(gif¦jpe?g)$ - [NC,F]
ErrorDocument 410 /410.shtml

lucy24

7:39 pm on Jul 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Urk. Someone else is going to lay out the optimal order for items in an .htaccess file.

What is that great block of conditions in the middle intended to do? I'm pretty sure that by the time you've worked through all those [OR] lines, it's going to end up doing something you didn't intend.

One quick note: In the hotlinking routine you have

RewriteCond %{HTTP_REFERER} .


I assume this is intended to exclude null referrers. But these generally come through as "-" (look at your logs) so it is safer to use the form

RewriteCond %{HTTP_REFERER} !^-?$


The same goes for blank UAs, which you will probably want to block from the whole site because they are almost always up to no good.

shultz

8:34 pm on Jul 20, 2011 (gmt 0)

10+ Year Member



Thank you ! That's a start to fixing my problems.

Ok as for the rest,

This part is supposed to block people using various proxy services.

RewriteCond %{HTTP:VIA} !^$ [NC,OR]
RewriteCond %{HTTP:FORWARDED} !^$ [NC,OR]
RewriteCond %{HTTP:USERAGENT_VIA} !^$ [NC,OR]
RewriteCond %{HTTP:X_FORWARDED_FOR} !^$ [NC,OR]
RewriteCond %{HTTP:PROXY_CONNECTION} !^$ [NC,OR]
RewriteCond %{HTTP:XPROXY_CONNECTION} !^$ [NC,OR]
RewriteCond %{HTTP:HTTP_PC_REMOTE_ADDR} !^$ [NC,OR]
RewriteCond %{HTTP:HTTP_CLIENT_IP} !^$ [NC,OR]

---------

This part is supposed to stop various user agents based on toolbars

RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Alexa\ Toolbar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GoogleToolbar\ Toolbar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Google\ Toolbar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MSN\ Toolbar [NC,OR]
RewriteCond %{http_user_agent} ^alexa\ toolbar [NC,OR]

------------
And finally this part where I'm having most problems is supposed to stop certain refer sites. But I can't get it to work.

RewriteCond %{HTTP_REFERER} truemiles\.com [NC,OR]
RewriteCond %{HTTP_REFERER} stockleaf\.com [NC,OR]
RewriteCond %{HTTP_REFERER} archive\.org [NC]
RewriteRule !410\.shtml$ - [G]

lucy24

9:29 pm on Jul 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hmmmm. So if any of these Unwanted People come around and request anything other than the custom 410 page, they will be told that the page they asked for is no longer there?

Uhmmm. It's generally not considered good form to lie to visitors. Besides, it could turn around and bite you. It would be perfectly legitimate to replace the [G] with an [F] meaning "Nuh-uh, we don't want your kind around here".

What exactly do you mean by "can't get it to work"? If you add yourself (by IP or UA) to the list of [OR] conditions, and then try to visit the site, what happens?

shultz

10:09 pm on Jul 20, 2011 (gmt 0)

10+ Year Member



It's only a lie because I'm still playing with everything. When I get everything sorted out, I'll put everything in order as it should be.

------


Ok,
What it's not doing is blocking the refer. Every time someone clicks on the link on the refer site... My site still loads normally for them.

And if I add my IP it blocks it.

lucy24

10:44 pm on Jul 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What it's not doing is blocking the refer. Every time someone clicks on the link on the refer site... My site still loads normally for them.

Hm. Have you tried the basic stuff like going to one of the referring sites yourself in a different browser (so nothing is cached or cookied) and seeing what happens if you click on the link? Also have a look at your logs and confirm that the referers are being named in the form you've coded for.

Come to think of it, why are these folks linking to your site if you particularly don't want their traffic? Have you tried simply asking them to remove the link? Or is this part of the "still playing with everything" so eventually they'll get some kind of custom redirect? Some sites have an "I don't like your face" type of page for a few Very Special Visitors.

And if I add my IP it blocks it.

Do you mean 403, or the 410 that you want people to get? I just asked this because it's an easy way to verify that the overall code is working.

shultz

11:44 pm on Jul 20, 2011 (gmt 0)

10+ Year Member



Yes I have used a different browser to verify. So nothing is cached or cookied.

And yes I'm still playing with everything at this time. So it's not a question of wanting or not wanting traffic. I want to get things in order before I make those decisions.

And I will be doing a 403 when it's all done.

shultz

3:15 pm on Jul 21, 2011 (gmt 0)

10+ Year Member



So much for getting a little bit of help. There used to be a time when a message on a forum like this would get a hundred responses. But now that Facebook is on the scene, all the forums are now dead.

g1smd

3:45 pm on Jul 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Some of us are a tad busy to provide instantaneous free help on a forum. I've answered more than 40 threads in various places today, as well as worked on three different sites in parallel, as well as worked on two other programming jobs, as well as building and reviewing multiple patches on two others, read about 40 emails and sent more than 20 emails, edited three articles, and reviewed analytics for three sites. I've done half what I intended today and it's already 5 p.m. so I'll be back when I can spare the time...

wilderness

4:25 pm on Jul 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FWIW, the creation and history of this forum revolves around one person (administrator) whom since approximately 2003 has provided tens of thousands of hours of useful an repetitive solution.

This administrator has been absent since late-October and as a result this forum is fairly crippled.

Your a new comer of only a few days and are without justification to criticize any participant here.

The WWW is a large space and answers (not free copy and paste solutions) may be located in many places.

forum charter [webmasterworld.com]

Forum Etiquette:

It is not appropriate to expect other members to write your code for you or to debug your entire project; Please don't expect other members to solve a problem you don't want to begin solving yourself.

Before posting a new thread, please try looking through the older posts in the forum index. Someone may have recently asked the same question, and you may benefit from the posted answers. Using the WebmasterWorld search function or the site-specific search feature of major search engines may help you find exactly what you are looking for on WebmasterWorld.

Please do not post specific details such as domain names, full IP addresses, or personally-identifiable information such as name, e-mail address, IM screen name, etc. Such specifics will be edited or removed in accordance with our Terms of Service [webmasterworld.com], which may render your post meaningless. Please replace all instances of your domain name with "example.com" before posting.

"Fix my code" and "Do my homework for me" threads:

This is a discussion forum, not a help desk or a free code-writing service; If you have a problem, please try to research it and then phrase your post in a manner conducive to general discussion of the issue. Rather than providing one-off solutions, we prefer to help people find resources to help themselves.

A general guideline for code-related problems is: Post your own code and describe what you hoped it would accomplish. Then describe how it fails and include all relevant information from your server error logs. Too-general posts in the form of "What code do I use to do this?" often go unanswered for a long time.

The following resources are often referenced in our Apache forum, and may help to answer or focus your questions.


forum library [webmasterworld.com]

search archives via google
site:www.webmasterworld.com HTTP_REFERER [google.com]

lucy24

7:18 pm on Jul 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There used to be a time

Back in the misty days of 20 July 2011? :P

Personally I am still scratching my head because I really don't see why the referer part wouldn't work as intended and we are both probably overlooking something embarrassingly obvious.

wilderness

7:23 pm on Jul 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



we are both probably overlooking something embarrassingly obvious.


RewriteCond %{HTTP_REFERER} stockleaf\.com [NC,OR]
RewriteCond %{HTTP_REFERER} archive\.org [NC]
RewriteRule !410\.shtml$ - [G]



The following line is an exception not am action

RewriteRule !410\.shtml$ - [G]
RewriteRule to what page?

The entire file and all those lines crammed together (rather than separated in organized module-fashion) is a nightmare looking for a place to happen.

wilderness

7:31 pm on Jul 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



further more, you may eliminate the domain extensions, which are basically redundant:

RewriteCond %{HTTP_REFERER} truemiles [NC,OR]
RewriteCond %{HTTP_REFERER} stockleaf [NC,OR]
RewriteCond %{HTTP_REFERER} archive [NC]

lucy24

10:31 pm on Jul 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule to what page?

That part's right, isn't it? If you have an [F] or a [G] you're not rewriting to anything except the 403 page or 410 page, which your server will take care of without any further help. Hence the bare - [httpd.apache.org]

you may eliminate the domain extensions, which are basically redundant

The perfectly nice people at domainname.org would hate to be lumped together with the evil crooks at domainname.com ;)

wilderness

11:57 pm on Jul 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The perfectly nice people at domainname.org would hate to be lumped together with the evil crooks at domainname.com


Those instances are reliant upon an organized htaccess by an experienced user, which in this instance is not applicable.

In addition the refers from same extensioned-domiains are highly unlikely in this uses.

wilderness

12:00 am on Jul 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That part's right, isn't it? If you have an [F] or a [G] you're not rewriting to anything except the 403 page or 410 page, which your server will take care of without any further help.


Once again this last lined is an "exception" to prevent a loop, as determined the leading exclamation point and NOT the trailing G.

As a result the refers are never applied.

lucy24

12:35 am on Jul 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Uh-oh. I'm reading the docs differently. This is under Rule, not Cond (the Cond version is shorter):
In mod_rewrite, the NOT character ('!') is also available as a possible pattern prefix. This enables you to negate a pattern; to say, for instance: ``if the current URL does NOT match this pattern''. This can be used for exceptional cases, where it is easier to match the negative pattern, or as a last default rule.

To me that meant "if the request is for anything other than the 410 page, the rule kicks in". Normally this kind of thing would go in a Cond, but it would only work in an AND series, while here we've got a bunch of ORs.

wilderness

12:39 am on Jul 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry your unable to see the obvious.

The entire file is badly clumped together (a result of copying and pasting) and needs separation into modules with multiple closing lines.

One year or five years from now (and especially with the absence of comments), this person will be incapable of making any sense out of the entire htaccess.

shultz

1:15 pm on Jul 22, 2011 (gmt 0)

10+ Year Member



@g1smd
There was a time when forums like this just buzzed. Today, I've noticed that if you're lucky to get help.. It takes days.

@wilderness
I didn't ask anyone to do anything for me.. I asked for help. When I want someone to do somemthing for me, I'll pay them for their time. (And I'll admit that I'm getting to that point of looking for someone)

ok,
I'm a hack and I admit it. But everyone has to start somewhere.
I understood this much in the same way as Lucy24 and that "rule kicks in".

shultz

1:16 pm on Jul 22, 2011 (gmt 0)

10+ Year Member



@g1smd
There was a time when forums like this just buzzed. Today, I've noticed that if you're lucky to get help.. It takes days.

@wilderness
I didn't ask anyone to do anything for me.. I asked for help. When I want someone to do somemthing for me, I'll pay them for their time. (And I'll admit that I'm getting to that point of looking for someone)

ok,
I'm a hack and I admit it. But everyone has to start somewhere.
I understood this much in the same way as Lucy24 and that "rule kicks in".

wilderness

1:28 pm on Jul 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"rule kicks in".

Than you simply omitted the line following the "exception" in your submission.

"I'm a hack and I admit it."

It was never intended to be a pun (most all htaccess users begin with the same copying and pasting method), rather an explanation to the dis-organized file.

shultz

11:11 pm on Jul 22, 2011 (gmt 0)

10+ Year Member



I'm in over my head on this one. I've been reading and re-reading over the materials on the Internet. It's probably simple but it's all chineese to me at this time.

So I'm going to post a help wanted ad on one of the freelancer sites.

lucy24

1:52 am on Jul 23, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can you please explain in words of two syllables why this is not a rule?

RewriteRule !410\.shtml$ - [G]


I tried the syntactically identical

RewriteRule !real_files - [G]


in a site I've got access to ("real_files" is the name of a directory) and it worked precisely as intended.