Welcome to WebmasterWorld Guest from 54.162.168.187

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

WARNING: Improper [NC] Usage Allows Bad Fake Bots to Crawl Sites

     
2:59 am on Feb 4, 2013 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14663
votes: 99


Just thought I'd post it as a thread of it's own as it deserves attention.

Depending on whether you're validating a bot or blocking bad bots determines whether or not you should use "[NC]" to ignore the case.

For instance, when trying to validate Googlebot, the syntax "Googlebot [NC]" will also allow the bad fakes spoofing with "googlebot". In this case you would not want the "[NC]" used nor with any other major search engine or service being allowed to crawl.

When blocking bots and looking for any variation of a bad bot name is when you would use the case insensitive matching "[NC]" option to make sure you catch all variants. Using "Bad Bot [NC]" will match "bad bot", "BAD BOT", "BaD BoT" or any other mixed case combination.

Just an FYI to make sure you're using "[NC]" in the right places for the right reasons and don't inadvertently let the bad guys in the door with such a simple mistake.
10:16 am on Apr 17, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Good tip.
11:48 am on Apr 17, 2013 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10847
votes: 61


unmentioned but assumed in the context of bots, incrediBILL is referring to the use of the mod_rewrite RewriteCond directive and HTTP_USER_AGENT as the TestString:
http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritecond

an example of blocking usage would be:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} badbot [NC]
RewriteRule .* - [F]
7:01 pm on Apr 17, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13676
votes: 440


Same goes for mod_setenvif: SetEnvIf(NoCase) and the common shorthand BrowserMatch(NoCase). If the correctly cased form is good, the incorrect form is probably an unwanted spoofer. But if the correctly cased form is already bad, the incorrect form may be even worse.

Watch out for UA's that legitimately contain two forms, as in "PrettyBot {blahblah} +http:// www.prettybot.com/crawlers"