Welcome to WebmasterWorld Guest from

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

Fake BingBot

1000s of hits from a Slicehost IP

9:10 am on Mar 18, 2013 (gmt 0)

New User

10+ Year Member

joined:Apr 4, 2005
posts: 13
votes: 0

I got 1000s of hits from an agent identifying itself as BingBot, but coming from a SliceHost IP. It is also ignoring robots.txt.

User-agent: *
Disallow: /wp-

(please excuse the lack of punctuation)

UserAgentMozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

UserAgentMozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

It followed thru this loop over and over, and at an agressive rate. I spotted it thanks to NewRelic/Loggly and its handy Chrome extension.

The IP belongs to youngshand.com, which is a marketing agency, so I wonder if they are not running any "tests". Has anyone seen this fake bot before?

[edited by: incrediBILL at 3:14 am (utc) on Mar 19, 2013]
[edit reason] unlinked URL [/edit]

9:29 pm on Apr 3, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
votes: 347

wilderness said:

RewriteCond %{REMOTE_ADDR} ^131\.253\.(3[0-9]|4[0-7])\. [OR]
{snip, snip}
RewriteCond %{REMOTE_ADDR} ^207\.[67][0-9]\.
RewriteCond %{HTTP_USER_AGENT} !(bingbot|msnbot)
RewriteRule !^robots\.txt$ - [F]

keyplyr said:

RewriteCond %{HTTP_USER_AGENT} (Bingbot|Bing\ Mobile\ |msnbot|MSRBOT) [NC]
RewriteCond %{REMOTE_ADDR} !^65\.5[2-5]\.
{snip, snip}
RewriteCond %{REMOTE_ADDR} !^207\.[67][0-9]\.
RewriteRule !^(forbidden\.html|robots\.txt)$ - [F]

cut-and-pasters note that this is a mirror-imaged pair of rules. The first says: "If it comes from a known bing/msn range and DOES NOT call itself the bingbot or msnbot..." The second says "If it calls itself the bingbot or msnbot and DOES NOT come from a known bing/msn range..."

The body of each rule gives the exceptions. In fact the rule is itself an exception; it's rare to have a RewriteRule whose pattern starts with ! Here it means "If they ask for anything other than..." The exception for "forbidden.html" (or any other custom 403 document) is to prevent the server from going into an infinite loop ending in a 500-class error. The bad robot won't get in, but your server has done some extra work.

An alternative is something like:

RewriteRule ^boilerplate/ - [L]

right at the top of your RewriteRules-- before all the [F] and [G] rules. (This is my version. All the error documents live in the /boilerplate directory along with most SSIs and similar files. It is no skin off my nose if the occasional robot asks for "forbidden.html" by name.)

In my case I don't need a mod_rewrite exception for robots.txt because all rules are already constrained by filename or at least extension. And I don't have any other .txt files.

If you block with more than one mod, you need a separate exception for each one. For example <Files "robots.txt"> if you use mod_auth-whatever-it-is-this-week for wholesale IP lockouts.
This 31 message thread spans 2 pages: 31