homepage Welcome to WebmasterWorld Guest from 54.242.18.232
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 31 message thread spans 2 pages: < < 31 ( 1 [2]     
Fake BingBot
1000s of hits from a Slicehost IP
Gaia




msg:4556010
 9:10 am on Mar 18, 2013 (gmt 0)

I got 1000s of hits from an agent identifying itself as BingBot, but coming from a SliceHost IP. It is also ignoring robots.txt.

User-agent: *
Disallow: /wp-


(please excuse the lack of punctuation)

Status200
Request/wp-login.php
Hostmysite.com
Referer-
RemoteIP50.57.148.171
Time2013-03-17T13:29:50+0000
UserAgentMozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Query?redirect_to=http%3A%2F%mysite.com%2Fwp-admin%2F&reauth=1
MethodGET

Status302
Request/wp-admin/index.php
Hostmysite.com
Referer-
RemoteIP50.57.148.171
Time2013-03-17T13:29:44+0000
UserAgentMozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Query
MethodGET


It followed thru this loop over and over, and at an agressive rate. I spotted it thanks to NewRelic/Loggly and its handy Chrome extension.

The IP belongs to youngshand.com, which is a marketing agency, so I wonder if they are not running any "tests". Has anyone seen this fake bot before?

[edited by: incrediBILL at 3:14 am (utc) on Mar 19, 2013]
[edit reason] unlinked URL [/edit]

 

lucy24




msg:4561141
 9:29 pm on Apr 3, 2013 (gmt 0)

wilderness said:

RewriteCond %{REMOTE_ADDR} ^131\.253\.(3[0-9]|4[0-7])\. [OR]
{snip, snip}
RewriteCond %{REMOTE_ADDR} ^207\.[67][0-9]\.
RewriteCond %{HTTP_USER_AGENT} !(bingbot|msnbot)
RewriteRule !^robots\.txt$ - [F]

keyplyr said:

RewriteCond %{HTTP_USER_AGENT} (Bingbot|Bing\ Mobile\ |msnbot|MSRBOT) [NC]
RewriteCond %{REMOTE_ADDR} !^65\.5[2-5]\.
{snip, snip}
RewriteCond %{REMOTE_ADDR} !^207\.[67][0-9]\.
RewriteRule !^(forbidden\.html|robots\.txt)$ - [F]

cut-and-pasters note that this is a mirror-imaged pair of rules. The first says: "If it comes from a known bing/msn range and DOES NOT call itself the bingbot or msnbot..." The second says "If it calls itself the bingbot or msnbot and DOES NOT come from a known bing/msn range..."

The body of each rule gives the exceptions. In fact the rule is itself an exception; it's rare to have a RewriteRule whose pattern starts with ! Here it means "If they ask for anything other than..." The exception for "forbidden.html" (or any other custom 403 document) is to prevent the server from going into an infinite loop ending in a 500-class error. The bad robot won't get in, but your server has done some extra work.

An alternative is something like:

RewriteRule ^boilerplate/ - [L]

right at the top of your RewriteRules-- before all the [F] and [G] rules. (This is my version. All the error documents live in the /boilerplate directory along with most SSIs and similar files. It is no skin off my nose if the occasional robot asks for "forbidden.html" by name.)

In my case I don't need a mod_rewrite exception for robots.txt because all rules are already constrained by filename or at least extension. And I don't have any other .txt files.

If you block with more than one mod, you need a separate exception for each one. For example <Files "robots.txt"> if you use mod_auth-whatever-it-is-this-week for wholesale IP lockouts.

This 31 message thread spans 2 pages: < < 31 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved