homepage Welcome to WebmasterWorld Guest from 54.167.185.110
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Block User Agent
block a specific user agent using htaccess
roblaw




msg:4503042
 3:47 pm on Oct 2, 2012 (gmt 0)

Hello all. We have a site that appears to getting click bombed (Adsense). The 75%-90% CTR was a bit of a giveaway.

We pulled the ads and pulled a log file. The offender appears to be a bot on Amazon. There is a range of IP's and we are considering blocking those, but obviously that could change over time.

The user-agent is as follows:
Mozilla/4.0 (compatible; Crawler; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)

I would like to block the user agent via htaccess. I believe that I would be safe in blocking any user agent that identifies as Mozilla/4.0 with the additional conditions of "compatible" and "crawler"

My htaccess skills are a bit limited and I could not locate anything with certainty in other forums.

Would the following Condition/Rule effectively block this bot? Would I be blocking potential 'wanted' visits?
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\ Crawler;\*$ [NC]
RewriteRule .* - [F,L]


Any feedback is greatly appreciated.

boblaw

 

wilderness




msg:4503044
 3:53 pm on Oct 2, 2012 (gmt 0)

This is sufficient, and will trap some other random bots as well.

RewriteEngine on
#UA "contains"
RewriteCond %{HTTP_USER_AGENT} Crawler [NC]
RewriteRule .* - [L]

There are a many bots that derive from Amazon.
It's best to deny the Amazon IP's as well.
See this thread [webmasterworld.com]

roblaw




msg:4503047
 4:07 pm on Oct 2, 2012 (gmt 0)

Wilderness,

Thanks for the quick reply.

Would the rewrite that you provided trap some other crawlers that we might otherwise want at the site?

Seems like a lot of the stuff running on AWS is completely unwanted. However, I have agree with some of the posters who mentioned the potential that you are blocking the "next big thing" to come along.

boblaw

phranque




msg:4503051
 4:10 pm on Oct 2, 2012 (gmt 0)

get rid of the '\*$' in the pattern as it will prevent the user agent string from matching.

wilderness




msg:4503138
 7:04 pm on Oct 2, 2012 (gmt 0)

Would the rewrite that you provided trap some other crawlers that we might otherwise want at the site?


Only if the term "crawler" is contained in the User-Agent.

BTW, you may also add "spider" and trap a few more pests.

Change this line to:
RewriteCond %{HTTP_USER_AGENT} (Crawler|spider) [NC]

lucy24




msg:4503191
 9:34 pm on Oct 2, 2012 (gmt 0)

Score another one for case sensitivity.

I took a quick stroll through recent logs. "Crawler", capitalized, seems to be the domain of low-budget robots. But "crawler", lower-case, will also lock out any robot whose UA string includes informational URLs such as ../crawler/ or ../crawlerinfo.html. I'd prefer to take a closer look at those. (Does not apply, of course, if you're a strict whitelister.)

To be safe, I'd leave out the [NC].

wilderness




msg:4503208
 10:29 pm on Oct 2, 2012 (gmt 0)

To be safe, I'd leave out the [NC].


And I would NOT, however lucy is certainly entitled to her own preferences.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved