Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- MSN's Stealth Missions


lucy24 - 4:48 am on Feb 20, 2012 (gmt 0)


Continuing the theme of wtf-ness:

I'd assumed that in the course of January [webmasterworld.com] I got to know all the major robot players. Today during a routine check of Bing/MS IPs, which normally results in dead silence*, I ran smack into a pile of msnbots.

Nothing new about msnbot/2.0b-- and that's just the point. Its owners [onlinehelp.microsoft.com] say it's been put out to pasture, replaced by the bingbot.** The specialized msnbot-media is still on the job, but I had to go all the way back to May of 2011 for the last vanilla msnbot. What's up? Does the msnbot know something about the Social Security system that it's not telling? Was the MSN retirement package not all that it expected?

In the middle of the msnbots-- did it think it could hide?-- was a whole slew of msnbot-NewsBlogs (their plural). They too have been around for years; they're mentioned in assorted WebmasterWorld threads. I have never met one before. (Never = since April 2011 when I started saving raw logs.)

They made a total of 16 successful requests. Half were for robots.txt, always taken in pairs. The other half were for...

Let me backtrack here. For a long time I had one unusually fat file that was inordinately popular with the wrong kind of robots. It also got the occasional search-engine hit, most of them from humans who were clearly looking for something else. Wasted time and bandwidth on all sides. A couple weeks back I cut off the first 5% of the file and saved it under the name of the original fat version. The old one got tucked away behind a new name, a nofollow link and a noindex meta tag. If humans want to read the whole thing they're welcome. Robots can jolly well go on a diet.

The newly arrived blogbot read this slimmed-down file eight separate times.

The newly pulled-from-retirement msnbot puttered around here and there-- including a single serving of robots.txt-- presumably hoping I wouldn't notice when it, too, read the slim file twice... followed by the fat file.

Well, hey. It's not google. It doesn't have to pay attention to the "nofollow" directive. And the file's already indexed, so it's not like it's seeing anything it hasn't seen dozens of times before.


* Figure of speech. It's really the computer's "Bzzt!" sound meaning "Nope, nothing here." The bingbot and the msnbot-media have already been filtered out; the plainclothes bot is blocked.
** They also say, quote, "Bing does not share IP addresses for our crawlers." I'll trade you a 65\.5[2-5]\. for a 157\.(5[4-9]|60). Anyone got a spare 207\.46\.?


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4372254.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com