Welcome to WebmasterWorld Guest from 54.145.246.183

Forum Moderators: Ocean10000 & incrediBILL & phranque

WARNING: Improper [NC] Usage Allows Bad Fake Bots to Crawl Sites

   
2:59 am on Feb 4, 2013 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Just thought I'd post it as a thread of it's own as it deserves attention.

Depending on whether you're validating a bot or blocking bad bots determines whether or not you should use "[NC]" to ignore the case.

For instance, when trying to validate Googlebot, the syntax "Googlebot [NC]" will also allow the bad fakes spoofing with "googlebot". In this case you would not want the "[NC]" used nor with any other major search engine or service being allowed to crawl.

When blocking bots and looking for any variation of a bad bot name is when you would use the case insensitive matching "[NC]" option to make sure you catch all variants. Using "Bad Bot [NC]" will match "bad bot", "BAD BOT", "BaD BoT" or any other mixed case combination.

Just an FYI to make sure you're using "[NC]" in the right places for the right reasons and don't inadvertently let the bad guys in the door with such a simple mistake.
10:16 am on Apr 17, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Good tip.
11:48 am on Apr 17, 2013 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



unmentioned but assumed in the context of bots, incrediBILL is referring to the use of the mod_rewrite RewriteCond directive and HTTP_USER_AGENT as the TestString:
http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritecond

an example of blocking usage would be:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} badbot [NC]
RewriteRule .* - [F]
7:01 pm on Apr 17, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Same goes for mod_setenvif: SetEnvIf(NoCase) and the common shorthand BrowserMatch(NoCase). If the correctly cased form is good, the incorrect form is probably an unwanted spoofer. But if the correctly cased form is already bad, the incorrect form may be even worse.

Watch out for UA's that legitimately contain two forms, as in "PrettyBot {blahblah} +http:// www.prettybot.com/crawlers"
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month