Welcome to WebmasterWorld Guest from

Forum Moderators: mack

Message Too Old, No Replies

MSN Crawling Hard

Deindexed site drama...

8:46 pm on Nov 22, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 1, 2004
votes: 0

I was just looking through my logs and noticed that msnbot was crawling our site pretty hard, grabbing about 10% of the site in the last half hour or so.

I just checked the site: command on Live and we've only got about 100 pages in their index now - which is fewer than the number of pages mentioned above.

Anyway, we keep thinking about blocking msn altogether and stopping them from wasting bandwidth. I know no one really cares about Live anymore, but I was wondering if anyone else noticed the same - especially if you think you're under some kind of penalty.

2:29 am on Nov 30, 2008 (gmt 0)

Junior Member from AU 

10+ Year Member

joined:Oct 20, 2001
votes: 8

Gidday BillyS

I use a couple of standard robots.txt instructions (can't remember where I got them from - probably here at WebmasterWorld, and then confirmed with SEs robots pages.)

For Google, Yahoo, MSN (presumably also Livesearch), teoma and another obscure SE;

User-agent: botname
Crawl-delay: 10

for MSN specifically;

User-agent: msnbot
Crawl-delay: 10

and general catch-all for anyone else who decides to start being nice:

User-agent: *
Crawl-delay: 15

That's 10 seconds and 15 seconds, obviously you can make it shorter or even longer, just double check the SE protocol.

I don't know if this creates a conflict with the SEs who recognise the robots crawl delay, I would presume not.

Since I started using it, I notice that the SEs don't seem to be so "grabby' when they come through on a large sweep after an algo update, which is usually 20-30 pages at a time on my primary site now.

I had seen up to 100 pages grabbed in a single pass previously, the bandwidth spikes were ... large ...

Now I mainly get mutiple daily visits of 1,2, upto 10 pages at a time, from the majors and their data centres, so it would appear that if you want "steady drip" rather than "sudden flood", it works.

Hope this is useful.