homepage Welcome to WebmasterWorld Guest from 54.237.98.229
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Want to block home page from fake self-referrals
aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4661479 posted 6:14 pm on Apr 8, 2014 (gmt 0)

(I tried to ask a similar question to this in another thread, but didn't explain it very well, and it was off-topic anyway, so decided to try to start a separate thread about it.)

Suppose I want to block access to my home page from bots that use my home page URL (http;//www.example.com/) as the referrer. Could either of the following two code samples work?

1.
# BLOCK HOME PAGE (/index.html) FROM FAKE SELF-REFERRALS
<Files /index.html>
RewriteCond %{HTTP_REFERER} example\.com/?$
RewriteRule .* - [F]
</Files>

2.
# BLOCK HOME PAGE (/) FROM FAKE SELF-REFERRALS
<Files />
RewriteCond %{HTTP_REFERER} example\.com/?$
RewriteRule .* - [F]
</Files>

The only difference is whether the home page is specified as / or as /index.html. I don't know which is correct, if either. And there could be other problems as well. So I would appreciate it very much if someone could look at it.

 

brotherhood of LAN

WebmasterWorld Administrator brotherhood_of_lan us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4661479 posted 6:23 pm on Apr 8, 2014 (gmt 0)

Usually a site has a link to its home page on the logo, like this site. What if someone clicked on that? The user would be blocked on the subsequent page view.

aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4661479 posted 6:38 pm on Apr 8, 2014 (gmt 0)

Usually a site has a link to its home page on the logo, like this site. What if someone clicked on that? The user would be blocked on the subsequent page view.

That's a good question, but the home page on this particular website doesn't have any links to itself. Of course, if it did I would have to remove them.
Another question is how to handle real visitors that land on another page and then click an internal link to the home page. I need to make sure that they aren't blocked, so that's another complication.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4661479 posted 7:29 pm on Apr 8, 2014 (gmt 0)

Usually a site has a link to its home page on the logo, like this site

Heh. And that's one reason to make sure your pages don't link to themselves. The other reason is that it annoys and confuses the user: "Isn't this the same page I was on before? I thought I clicked on a link." If you've got the same logo everywhere, suppress the link on your home page.

If the rule is intended to apply only to your home page, then say so:

RewriteRule ^$ - [F]

That's assuming htaccess, where a request for the title page comes through as [nothing].

RewriteCond %{HTTP_REFERER} example\.com/?$

Yes. The closing anchor is essential, because you're looking at a specific page, not your whole domain. But leave off the opening anchor, because auto-referers will often get your domain name wrong. In fact if you've got a domain-name-canonicalization redirect, you can add the "wrong" form to your referer blocks.

:: shuffling papers because I think I've done this myself ::

Yeah, it's part of my referer-blocking package.

RewriteCond %{HTTP_REFERER} ^http://www\.example\.com [NC,OR]

On this site, the canonical form is "example.com" alone. So anything giving "www.example.com" is bogus. Mine has [OR] at the end because it's followed by a short list of .ua and similar tld's. (I do not have a lot of human Ukrainian readers.)

aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4661479 posted 7:54 pm on Apr 8, 2014 (gmt 0)

Thanks Lucy
I'll need to spend some time studying everything you said. In some of your replies you seem to think that I know more than I actually do, but that's okay.

aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4661479 posted 5:59 pm on Apr 9, 2014 (gmt 0)

I'm still trying to work out the proper way to do this, but am having trouble finding pertinent information through searching, so would like to ask another question here.

Following what Lucy said about the RewriteRule, the two choices in my original post can be corrected as follows:
1.
# BLOCK HOME PAGE (/index.html) FROM FAKE SELF-REFERRALS
<Files /index.html>
RewriteCond %{HTTP_REFERER} example\.com/?$
RewriteRule ^$ - [F]
</Files>

2.
# BLOCK HOME PAGE (/) FROM FAKE SELF-REFERRALS
<Files />
RewriteCond %{HTTP_REFERER} example\.com/?$
RewriteRule ^$ - [F]
</Files>

(The only difference is whether the home page is specified as / or as /index.html.)

So my question is, which is the best choice?

Of course, if both choices are wrong, I would appreciate it if someone could point out the errors.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4661479 posted 9:24 pm on Apr 9, 2014 (gmt 0)

Yikes. It is technically possible to put RewriteRules inside a <Files> or <FilesMatch> envelope. I do it myself, because I didn't know you weren't supposed to until after I'd got in the habit. But here the envelopes are both wrong and unnecessary.

Unnecessary because in mod_rewrite you can achieve the identical effect simply by putting the filename in the body of the rule.

Wrong for two reasons:
#1 a <Files> envelope looks ONLY at the filename, not at the path. The / slash is not part of the filename, so the envelope will always fail.
#2 by default, RewriteRules are NOT inherited. So the moment you've got an envelope with RewriteRules of its own, this will wipe out any other RewriteRules affecting the same files. Sure you can say RewriteOptions Inherit, but that just creates new issues. And this setting, itself, is not inherited.

If you do have RewriteRules inside a <Files> envelope, you need a fresh "RewriteEngine on" directive as well. It's exactly as if you had RewriteRules in different htaccess files.

"index.html" is the name of your physical file. But it, ahem, isn't part of your URLs. Is it?

I think the correct form was posted earlier in this thread.

RewriteCond %{HTTP_REFERER} ^http://(www\.)?example\.com(/(index\.html)?)?$
RewriteRule ^(index\.html)?$ - [F]


Make sure your front page doesn't link to itself. I added the (index\.html)? in both lines because malign robots might ask for the pagename in that form, so you may as well block them at once. Same idea as the optional "www." which they may also get wrong. Closing anchor is crucial! You're looking only at the front page, not at anything else on your site.

You can do a lot more with blocks on bogus internal referers. But most of it is site-specific-- like getting the with/without www. wrong-- so this is no place for cut-and-paste.

aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4661479 posted 4:25 pm on Apr 10, 2014 (gmt 0)

Thanks, Lucy. You've been extremely helpful. Your knowledge is amazing, and it's also amazing how you're willing to take so much of your time to help other people here.

I tried out the code you posted and it appears to be working, at least for the basic case of outside self-referal requests for http://www.example.com/ . So far nothing has come along to test the variations that you included. But the basic case is the main problem I need to deal with for this site anyway.

Also, the site's internal navigation apparently hasn't been affected at all, so that part is okay too.

So thanks again

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved