homepage Welcome to WebmasterWorld Guest from 54.145.238.55
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
WARNING: Improper [NC] Usage Allows Bad Fake Bots to Crawl Sites
incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4542167 posted 2:59 am on Feb 4, 2013 (gmt 0)

Just thought I'd post it as a thread of it's own as it deserves attention.

Depending on whether you're validating a bot or blocking bad bots determines whether or not you should use "[NC]" to ignore the case.

For instance, when trying to validate Googlebot, the syntax "Googlebot [NC]" will also allow the bad fakes spoofing with "googlebot". In this case you would not want the "[NC]" used nor with any other major search engine or service being allowed to crawl.

When blocking bots and looking for any variation of a bad bot name is when you would use the case insensitive matching "[NC]" option to make sure you catch all variants. Using "Bad Bot [NC]" will match "bad bot", "BAD BOT", "BaD BoT" or any other mixed case combination.

Just an FYI to make sure you're using "[NC]" in the right places for the right reasons and don't inadvertently let the bad guys in the door with such a simple mistake.

 

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4542167 posted 10:16 am on Apr 17, 2013 (gmt 0)

Good tip.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4542167 posted 11:48 am on Apr 17, 2013 (gmt 0)

unmentioned but assumed in the context of bots, incrediBILL is referring to the use of the mod_rewrite RewriteCond directive and HTTP_USER_AGENT as the TestString:
http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritecond

an example of blocking usage would be:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} badbot [NC]
RewriteRule .* - [F]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4542167 posted 7:01 pm on Apr 17, 2013 (gmt 0)

Same goes for mod_setenvif: SetEnvIf(NoCase) and the common shorthand BrowserMatch(NoCase). If the correctly cased form is good, the incorrect form is probably an unwanted spoofer. But if the correctly cased form is already bad, the incorrect form may be even worse.

Watch out for UA's that legitimately contain two forms, as in "PrettyBot {blahblah} +http:// www.prettybot.com/crawlers"

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved