Forum Moderators: phranque
i have a bit of a problem....
personly i dont know how it works when the gamble and money loan sites send over a bot hitting a certain page on your site.
But this is the situation..long ago i changed my site in to nice urls and noticed that with 404 error pages that a certain page was constantly hit by those sites.
but for whatever reason they had it wrong cause the page name had a capital T in it,instead of a t.
But it never ends...they still look for that page abusing my site boosting up whatever rank they have.
So my question is...if i create that page they are hitting does anyone have some content idea so it crashes whatever they use?
Then move on. These bots have obviously recorded your old page in a database of pages they spam, which hasn't been updated in ages. But search engines, especially Google, also tend to hold on to old pages for ages. It's not worth your while to mess with them.
I'm not certain how referral spammers work, but it's probable that they just send out the page request without actually loading the page that is returned into a browser, so they can use more computing power on sending out the spam. So putting something malicious on your old page is likely to be useless and even counterproductive.
I'm also not sure if the bad sites are "boosting up whatever rank they have" by hitting yours -- or even if they could. Regardless, just block them and, like Rosalind said, move on. Here's how:
RewriteEngine on
RewriteCond %{REMOTE_HOST} \.badsite1\.com$ [NC,OR]
RewriteCond %{REMOTE_HOST} \.badsite2\.net$ [NC]
RewriteRule .* - [F]
Just swap in the hosts you want to block. And if it's only one host, simply remove the OR flag:
RewriteEngine on
RewriteCond %{REMOTE_HOST} \.badsite\.com$ [NC]
RewriteRule .* - [F]
That way, you block the hits you don't want and you don't get into trouble serving malicious content.
Since looking up this variable requires a reverse-DNS lookup*, it is highly-inefficient, and many hosting services won't allow it. If that is the case, then block that remote host by using its IP address or range of IP addresses.
* A reverse-DNS lookup requires your server to send a request to the DNS system, and await a response before serving the requested page, image, stylesheet, etc. If your host has no local DNS server, or if it does not cache recent DNS requests, using %{REMOTE_HOST} lookups on a busy site can have a horrible impact on performance. And that's why many hosts won't support it -- It can make them look really bad, performance-wise.
It is best to avoid these RDNS lookups unless absolutely necessary. If they *are* necessary, then try to constrain them. In other words, instead of doing
RewriteCond %{REMOTE_HOST} www\.xyz\.cn
RewriteRule .* - [F]
RewriteCond %{REMOTE_ADDR} ^2[01][0-9]\.
RewriteCond %{REMOTE_HOST} www\.xyz\.cn
RewriteRule .* - [F]
Jim