Forum Moderators: DixonJones
What do I put into .htaccess to ban anything from one site?
Note: I havn't attempted the 'almost perfect ban list thing' (not sure of the WebmasterWorld url) as I'm new to banning. Will get round to studying the subject in future.
I havn't attempted the 'almost perfect ban list thing' (not sure of the WebmasterWorld url)
kapow - I haven't tried the perfect ban list thing either, but I remembered this link to the Updated Robots list from the current WebmasterWorld home page:
[webmasterworld.com...]
There's a forum (forum 11) dedicated to Spider Identification that might give you a better answer than here.
[webmasterworld.com...]
[webmasterworld.com...]
What probably happens is that people are interested in your domain names, or similar names. They can search for those on that site, and then click on a link to get to the respective site for each domain (if one exists).
I'm not quite sure why you's want to block those visitors, just because they went there first. It's trivial for them to circumvent your block, and I don't see how their visit can do you any harm.
The asterisk is a special character and so is the dot, that's why both must be escaped using "\".
The asterix certainly shouldn't be in that RewriteCond pattern, as it also isn't in the real domain.
In addition to this, i'll have to emphasize that a ban is an individual decision, and you should always investigate, to make sure you have a valid reason before you decide to do so. I believe i've been trying to state the same a few times in the "close to perfect .htacces" thread.
>> The asterix certainly shouldn't be in that RewriteCond pattern
Sorry about that, i thought it was the verbatim referrer that was posted. I supposed it was a forged referrer string - the asterisk does not appear in real URLs, and the ".sc" is not a common TLD, so i didn't even consider it to be a URL, just a string.
Still, the decision might be valid for kapow even if it should be invalid for everyone else in the world. It was a perfectly unambiguos question ("how do i...") and i do not feel it was wrong to answer it; i would even tell how to ban the Googlebot if that was the issue.
/claus
It's not really referrals, but the system at that domain itself will fetch the root page of each domain, to see whether there's a web site online or not. You may or may not like that kind of statistics gathering, but personally I don't see any harm. After all, they *do* provide an extremely useful service to the general public.
Of course you're right, claus, the original question was of a purely technical nature. But I have seen enough people jump to conclusions and ban stuff just because someone else mentioned they'd ban it (often without giving any reasons), so I wanted to encourage a few second thoughts about the matter.
1.) Re. That site:
The WebmasterWorld system doesn't let me type the url (for obvious reasons). For the record I think that site is excellent! I use it sometimes my self. However, I think some people are using the facilities on that site for spam/harvesting reasons. I manage some domain names for company/internal use only, names that are not published. Why would there be 15+ referrals from that site for an 'internal-use' name? (the name is also very unusual - you would not mistake it or guess it). Because this keeps happening with unpromoted domain names I am suspicious of some-users of that site. As you said - if someone wanted to visit the site and they already know the name then they can easily do so.
2.) Re. Banning a site that I choose:
Thanks Claus - I can't think of a reason for doing so but who knows, one day I might want to ban visitors from google. I havn't decided if I want to ban or not for that site, but I do want to know how.
Suppose it is widget.com - how do I ban visitors from it?
A whois database who gater site information do whois bulk checks and no option for to block it. Looks for me a company that want to grap domains that are in pending delete and have some good trafic or provide that to other persons.
Sound for me worst than a spambot, or am I the only one here who thinks like that.
All you have to do is to change the second line of the example in post #4:
RewriteCond %{HTTP_REFERRER} widget\.com/ [NC,OR]
RewriteCond %{HTTP_REFERRER} google\.com/ [NC] The last of these two will not catch all Google referrals, as there's also all the Google IP's, but i included it to illustrate the use of the [OR] operator - if i had not included this, an "AND" would be the default, and you can't really be reffered from two places at the same time, so that wouldn't work (1).
Another way of banning visitors referred by these two sites would be this:
RewriteCond %{HTTP_REFERRER} (widget¦google)\.com/ [NC] Which is: "widget OR google" followed by ".com/" (2)
Sooner or later, however, somebody will tell you that you should use this line in stead if you want to ban widget.com:
RewriteCond %{HTTP_REFERRER} ^http://www\.widget\.com/ [NC] The character before "http" is an anchor, it means that the string should start like this (3). So, that way of doing it is valid as well, but it will not catch the domain without "www." You'll have to make this subdomain optional by using the "optional"-operator; the questionmark - in this example it makes the content of the parenthesis optional:
RewriteCond %{HTTP_REFERRER} ^http://(www\.)?widget\.com/ [NC] As you see, it's already a bit harder to read. Plus, other subdomains, like, say "
badbot.widget.com" will not be banned by this. For this reason i always include only the most significant part(s) of the string(s) i want to match. Everything else is likely to cause some kind of unexpected error at some point. So, could you just write, say, "widget" and not ".com/"? Of course you could, if you are sure you don't mind that a refferral from an url like this will be banned as well:
http //myownpage.com/greenwidgetstore/page.html
The pattern will try to match anywhere in the string if you do not specify an anchor. So, just stating the minimum can have sideeffects too. The ".com/" is normally not used in filenames, so it narrows down the possibilities of banning something that you didn't intend to.
---
I might as well answer that upcoming question straight away. To ban any referrer that is not your own domain, use the "not"-operator "!", like this:
RewriteCond %{HTTP_REFERRER} [b]![/b]^http://(www\.)?mypage\.com/ [NC] I used the "strict" way of declaring the URL here, as i'm confident that you know the URL combinations of your own domain.
/claus
gif$. [edited by: claus at 1:31 pm (utc) on Oct. 14, 2003]
At least whois identifies their bot so those who don't want it on their site can easily ban it. The others don't because they want to hide what they're doing.