| WARNING: Improper [NC] Usage Allows Bad Fake Bots to Crawl Sites
|
incrediBILL

msg:4542169 | 2:59 am on Feb 4, 2013 (gmt 0) | Just thought I'd post it as a thread of it's own as it deserves attention. Depending on whether you're validating a bot or blocking bad bots determines whether or not you should use "[NC]" to ignore the case. For instance, when trying to validate Googlebot, the syntax "Googlebot [NC]" will also allow the bad fakes spoofing with "googlebot". In this case you would not want the "[NC]" used nor with any other major search engine or service being allowed to crawl. When blocking bots and looking for any variation of a bad bot name is when you would use the case insensitive matching "[NC]" option to make sure you catch all variants. Using "Bad Bot [NC]" will match "bad bot", "BAD BOT", "BaD BoT" or any other mixed case combination. Just an FYI to make sure you're using "[NC]" in the right places for the right reasons and don't inadvertently let the bad guys in the door with such a simple mistake.
|
g1smd

msg:4565578 | 10:16 am on Apr 17, 2013 (gmt 0) | Good tip.
|
phranque

msg:4565598 | 11:48 am on Apr 17, 2013 (gmt 0) | unmentioned but assumed in the context of bots, incrediBILL is referring to the use of the mod_rewrite RewriteCond directive and HTTP_USER_AGENT as the TestString: http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritecond an example of blocking usage would be: RewriteEngine on RewriteCond %{HTTP_USER_AGENT} badbot [NC] RewriteRule .* - [F]
|
|
|
lucy24

msg:4565731 | 7:01 pm on Apr 17, 2013 (gmt 0) | Same goes for mod_setenvif: SetEnvIf(NoCase) and the common shorthand BrowserMatch(NoCase). If the correctly cased form is good, the incorrect form is probably an unwanted spoofer. But if the correctly cased form is already bad, the incorrect form may be even worse. Watch out for UA's that legitimately contain two forms, as in "PrettyBot {blahblah} +http:// www.prettybot.com/crawlers"
|
|
|