Page is a not externally linkable
- Microsoft
-- Bing Search Engine News
---- Bing ignoring robots.txt?


revrob - 7:26 pm on Sep 28, 2012 (gmt 0)


I've virtually given up with bingbot - having tried a whole variety of methods, via robots.txt and .htaccess. Even when I had all the bingbot IP ranges supposedly banned, I found that bingbot was occasionally accessing bulky media files in disallowed folders, even using an IP address that should have been totally banned, and which was getting a 403 response everywhere else on my site - it seemed to be able to evade the Rewrite to [F] commands when accessing a minority of some pdf and jpg files (which were also restricted in robots.txt but bing didn't care about that either.

My current experiment is to use a rewrite command to send all the various MS IP ranges I can identify, to visit robots.txt, whatever it is they are asking for, where they can chew on the disallow directive for bingbot that they are so keen to ignore.

User-agent: bingbot
Disallow: /

Here is what I have put up this afternoon in .htaccess

RewriteCond %{REMOTE_ADDR} ^157\.(5[4-9]|60)\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))$
RewriteCond %{REQUEST_URI} !^/robots\.txt$
RewriteRule .* http://www.example.com/robots.txt [L]
RewriteCond %{REMOTE_ADDR} ^131\.253\.(2[1-9]|3[0-9]|4[0-7])\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))$
RewriteCond %{REQUEST_URI} !^/robots\.txt$
RewriteRule .* http://www.example.com/robots.txt [L]
RewriteCond %{REMOTE_ADDR} ^65\.52\.([0-9]|[1-4][0-9]|5[0-5])\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))$
RewriteCond %{REQUEST_URI} !^/robots\.txt$
RewriteRule .* http://www.example.com/robots.txt [L]

I have robots.txt listed in the "don't rewrite" section near the beginning of .htaccess.
RewriteCond %{REQUEST_URI} !/robots\.txt$
RewriteCond %{REQUEST_URI} !^/robots\.txt$

I'm now waiting to see if that works or if some of the bingbot visits will continue to somehow evade it.

I have had a couple of bingbot visits since putting that code up, which have redirected nicely to robots.txt

If MS are not prepared to observe robots.txt then I am not prepared to let them read anything EXCEPT robots.txt

The only other legit bot I have trouble with is Yahoo Slurp! which also has a habit of ignoring robots.txt directives but I have managed to tame that one via .htaccess.


Thread source:: http://www.webmasterworld.com/msn_microsoft_search/4441555.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com