Welcome to WebmasterWorld Guest from 54.242.94.72

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Block User Agent

block a specific user agent using htaccess

     

roblaw

3:47 pm on Oct 2, 2012 (gmt 0)

10+ Year Member



Hello all. We have a site that appears to getting click bombed (Adsense). The 75%-90% CTR was a bit of a giveaway.

We pulled the ads and pulled a log file. The offender appears to be a bot on Amazon. There is a range of IP's and we are considering blocking those, but obviously that could change over time.

The user-agent is as follows:
Mozilla/4.0 (compatible; Crawler; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)

I would like to block the user agent via htaccess. I believe that I would be safe in blocking any user agent that identifies as Mozilla/4.0 with the additional conditions of "compatible" and "crawler"

My htaccess skills are a bit limited and I could not locate anything with certainty in other forums.

Would the following Condition/Rule effectively block this bot? Would I be blocking potential 'wanted' visits?
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\ Crawler;\*$ [NC]
RewriteRule .* - [F,L]


Any feedback is greatly appreciated.

boblaw

wilderness

3:53 pm on Oct 2, 2012 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



This is sufficient, and will trap some other random bots as well.

RewriteEngine on
#UA "contains"
RewriteCond %{HTTP_USER_AGENT} Crawler [NC]
RewriteRule .* - [L]

There are a many bots that derive from Amazon.
It's best to deny the Amazon IP's as well.
See this thread [webmasterworld.com]

roblaw

4:07 pm on Oct 2, 2012 (gmt 0)

10+ Year Member



Wilderness,

Thanks for the quick reply.

Would the rewrite that you provided trap some other crawlers that we might otherwise want at the site?

Seems like a lot of the stuff running on AWS is completely unwanted. However, I have agree with some of the posters who mentioned the potential that you are blocking the "next big thing" to come along.

boblaw

phranque

4:10 pm on Oct 2, 2012 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



get rid of the '\*$' in the pattern as it will prevent the user agent string from matching.

wilderness

7:04 pm on Oct 2, 2012 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Would the rewrite that you provided trap some other crawlers that we might otherwise want at the site?


Only if the term "crawler" is contained in the User-Agent.

BTW, you may also add "spider" and trap a few more pests.

Change this line to:
RewriteCond %{HTTP_USER_AGENT} (Crawler|spider) [NC]

lucy24

9:34 pm on Oct 2, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Score another one for case sensitivity.

I took a quick stroll through recent logs. "Crawler", capitalized, seems to be the domain of low-budget robots. But "crawler", lower-case, will also lock out any robot whose UA string includes informational URLs such as ../crawler/ or ../crawlerinfo.html. I'd prefer to take a closer look at those. (Does not apply, of course, if you're a strict whitelister.)

To be safe, I'd leave out the [NC].

wilderness

10:29 pm on Oct 2, 2012 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



To be safe, I'd leave out the [NC].


And I would NOT, however lucy is certainly entitled to her own preferences.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month