Forum Moderators: phranque

Message Too Old, No Replies

Ban Referrer Spam from Blogspot

         

frontpage

5:07 pm on Feb 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a question for the .htacess experts on banning pron spam from referrers coming from blogspot.com domains.

It seems pron sites are setting themselves up in the blogspot.com domains owned by Google.

The format of the referrers is keyword1-keyword2.blogspot.com

The IP for the blogspot.com domains resolves to 66.102.15.101 which is Google.com

I don't want to ban google from my sites with a

Deny from 66.102.15.101

Perhaps a wildcard format for the .htacess in the form of *.blogspot.com? Such as this. But is it proper?

RewriteCond %{HTTP_REFERER} ^http://(www.)?*.blogspot.com(/)?.*$ [OR]

Any help would be greatly appreciated.

frontpage

11:08 pm on Feb 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RewriteCond %{HTTP_REFERER} ^http://(www.)?*.blogspot.com(/)?.*$ [OR]

Well that did not work.

So, I tried

RewriteCond %{HTTP_REFERER} ^http://(www.)?.blogspot.com(/)?.*$ [OR]

Any ideas any one?

bird

11:32 pm on Feb 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What do you expect your "ban" to have for an effect?

Those log spam entries are left by an automatic bot. This bot will make its rounds whether you serve it a status 200, 403, or anything else. You can fry your brain about the right .htaccess syntax, but I don't think you'll acheive anything useful that way in such a case.

frontpage

12:21 am on Feb 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The purpose of my post was to request information on proper syntax for my .htaccess.

Many website owners have referrer pages to keep track of where incoming links originate.

I prefer not to go into detail and spread the technique of this particular method.

If you can help, I would appreciate it.

[edited by: jdMorgan at 1:12 am (utc) on Feb. 9, 2004]
[edit reason] speling [/edit]

Robert Thivierge

12:58 am on Feb 9, 2004 (gmt 0)

10+ Year Member



Here's my take. If you use it, please understand every character and test before relying on it.

RewriteCond %{HTTP_REFERER} ^http://(www\.)?([a-z0-9\-]+)\.blogspot\.com(/)? [NC]

1) The "OR" condition implies you have other conditions, and want any "true" condition to trigger failure. Without seeing surrounding lines, I can't see if it's appropriate.
2) You need to match the sub-domain. I make the assumption a sub-domain is alphanumeric.
3) Put a backslash before periods, if you literally mean a period (not a wildcard).
4) Domain names can appear as upper or mixed case, so make no assumptions, and use NC (with or without OR).

Also, for it to work, you obviously have to have an appropriate RewriteRule.

WARNING: AltaVista's scooter spider (and maybe others) provide referer informatin in GET requests. You could block such a spider with the above code.

Also, you should track the ip of the "bad" visitors (not the domain, NOT GOOGLE). Then, if it's always the same, find out who owns it, and consider blocking that specific ip.

jdMorgan

1:10 am on Feb 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can just simplify that to:
 RewriteCond %{HTTP_REFERER} ^http://.*blogspot\.com [NC,OR] 

including the [NC] recommended by Robert T. and also his recommendation to be careful with this. If you can add further RewriteConds, for example specifying REMOTE_ADDR IP address ranges, REQUEST_URI pages requested, or anything else to constrain or "narrow down" the application of the RewriteRule to specific cases, that would be a good idea.

Jim

bird

2:51 am on Feb 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The purpose of my post was to request information on proper syntax for my .htaccess.

Proper syntax derives from proper semantics, which means it depends on the purpose of what you're trying to do. There's not really any proper blocking syntax for a problem that can't be solved by blocking.

Many website owners have referrer pages to keep track of where incoming links originate.

There are no special pages needed to keep track of incoming links. What you probably mean are pages where those referring links are displayed for the general public to see. Those typically work based on a SSI script, which will indeed be circumvented if you block the respective page from loading. But do you really want to block real visitors just because they happen to come from some (legitimate) blog link? I assume that the spammers are still a tiny minority among the blogspot users.

You can't reliably protect your autogenerated links. The spammy blogspot referrers of today will be replaced by a dozen other domains tomorrow, and a hundred more next week. Keeping track of them all is not worth the effort, just for the dubious benefit of those bragging referrer displays. The semantical answer therefore really is not do display your referring links on your site. They are of no interest to your visitors anyway. If you think that your visitors should know about some of those sites, place a normal link to them somewhere. It's a lot easier to manage a few positive examples than to weed out the spam.

If you can help, I would appreciate it.

jdMorgan gave the pattern that I consider the most effective technically.

jdMorgan

3:03 am on Feb 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Gentlemen,

We will endeavor here to directly answer the posted question.

Side issues should be addressed in an advisory manner only.

Thanks,
Jim