homepage Welcome to WebmasterWorld Guest from 54.242.126.126
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Block User Agent
block a specific user agent using htaccess
roblaw

10+ Year Member



 
Msg#: 4503040 posted 3:47 pm on Oct 2, 2012 (gmt 0)

Hello all. We have a site that appears to getting click bombed (Adsense). The 75%-90% CTR was a bit of a giveaway.

We pulled the ads and pulled a log file. The offender appears to be a bot on Amazon. There is a range of IP's and we are considering blocking those, but obviously that could change over time.

The user-agent is as follows:
Mozilla/4.0 (compatible; Crawler; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)

I would like to block the user agent via htaccess. I believe that I would be safe in blocking any user agent that identifies as Mozilla/4.0 with the additional conditions of "compatible" and "crawler"

My htaccess skills are a bit limited and I could not locate anything with certainty in other forums.

Would the following Condition/Rule effectively block this bot? Would I be blocking potential 'wanted' visits?
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\ Crawler;\*$ [NC]
RewriteRule .* - [F,L]


Any feedback is greatly appreciated.

boblaw

 

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4503040 posted 3:53 pm on Oct 2, 2012 (gmt 0)

This is sufficient, and will trap some other random bots as well.

RewriteEngine on
#UA "contains"
RewriteCond %{HTTP_USER_AGENT} Crawler [NC]
RewriteRule .* - [L]

There are a many bots that derive from Amazon.
It's best to deny the Amazon IP's as well.
See this thread [webmasterworld.com]

roblaw

10+ Year Member



 
Msg#: 4503040 posted 4:07 pm on Oct 2, 2012 (gmt 0)

Wilderness,

Thanks for the quick reply.

Would the rewrite that you provided trap some other crawlers that we might otherwise want at the site?

Seems like a lot of the stuff running on AWS is completely unwanted. However, I have agree with some of the posters who mentioned the potential that you are blocking the "next big thing" to come along.

boblaw

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4503040 posted 4:10 pm on Oct 2, 2012 (gmt 0)

get rid of the '\*$' in the pattern as it will prevent the user agent string from matching.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4503040 posted 7:04 pm on Oct 2, 2012 (gmt 0)

Would the rewrite that you provided trap some other crawlers that we might otherwise want at the site?


Only if the term "crawler" is contained in the User-Agent.

BTW, you may also add "spider" and trap a few more pests.

Change this line to:
RewriteCond %{HTTP_USER_AGENT} (Crawler|spider) [NC]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4503040 posted 9:34 pm on Oct 2, 2012 (gmt 0)

Score another one for case sensitivity.

I took a quick stroll through recent logs. "Crawler", capitalized, seems to be the domain of low-budget robots. But "crawler", lower-case, will also lock out any robot whose UA string includes informational URLs such as ../crawler/ or ../crawlerinfo.html. I'd prefer to take a closer look at those. (Does not apply, of course, if you're a strict whitelister.)

To be safe, I'd leave out the [NC].

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4503040 posted 10:29 pm on Oct 2, 2012 (gmt 0)

To be safe, I'd leave out the [NC].


And I would NOT, however lucy is certainly entitled to her own preferences.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved