homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

WARNING: Improper [NC] Usage Allows Bad Fake Bots to Crawl Sites

 2:59 am on Feb 4, 2013 (gmt 0)

Just thought I'd post it as a thread of it's own as it deserves attention.

Depending on whether you're validating a bot or blocking bad bots determines whether or not you should use "[NC]" to ignore the case.

For instance, when trying to validate Googlebot, the syntax "Googlebot [NC]" will also allow the bad fakes spoofing with "googlebot". In this case you would not want the "[NC]" used nor with any other major search engine or service being allowed to crawl.

When blocking bots and looking for any variation of a bad bot name is when you would use the case insensitive matching "[NC]" option to make sure you catch all variants. Using "Bad Bot [NC]" will match "bad bot", "BAD BOT", "BaD BoT" or any other mixed case combination.

Just an FYI to make sure you're using "[NC]" in the right places for the right reasons and don't inadvertently let the bad guys in the door with such a simple mistake.



 10:16 am on Apr 17, 2013 (gmt 0)

Good tip.


 11:48 am on Apr 17, 2013 (gmt 0)

unmentioned but assumed in the context of bots, incrediBILL is referring to the use of the mod_rewrite RewriteCond directive and HTTP_USER_AGENT as the TestString:

an example of blocking usage would be:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} badbot [NC]
RewriteRule .* - [F]


 7:01 pm on Apr 17, 2013 (gmt 0)

Same goes for mod_setenvif: SetEnvIf(NoCase) and the common shorthand BrowserMatch(NoCase). If the correctly cased form is good, the incorrect form is probably an unwanted spoofer. But if the correctly cased form is already bad, the incorrect form may be even worse.

Watch out for UA's that legitimately contain two forms, as in "PrettyBot {blahblah} +http:// www.prettybot.com/crawlers"

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved