Welcome to WebmasterWorld Guest from

Forum Moderators: mack

Message Too Old, No Replies

MSN Crawling Hard

Deindexed site drama...

8:46 pm on Nov 22, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 1, 2004
votes: 0

I was just looking through my logs and noticed that msnbot was crawling our site pretty hard, grabbing about 10% of the site in the last half hour or so.

I just checked the site: command on Live and we've only got about 100 pages in their index now - which is fewer than the number of pages mentioned above.

Anyway, we keep thinking about blocking msn altogether and stopping them from wasting bandwidth. I know no one really cares about Live anymore, but I was wondering if anyone else noticed the same - especially if you think you're under some kind of penalty.

2:29 am on Nov 30, 2008 (gmt 0)

Junior Member from AU 

10+ Year Member

joined:Oct 20, 2001
votes: 4

Gidday BillyS

I use a couple of standard robots.txt instructions (can't remember where I got them from - probably here at WebmasterWorld, and then confirmed with SEs robots pages.)

For Google, Yahoo, MSN (presumably also Livesearch), teoma and another obscure SE;

User-agent: botname
Crawl-delay: 10

for MSN specifically;

User-agent: msnbot
Crawl-delay: 10

and general catch-all for anyone else who decides to start being nice:

User-agent: *
Crawl-delay: 15

That's 10 seconds and 15 seconds, obviously you can make it shorter or even longer, just double check the SE protocol.

I don't know if this creates a conflict with the SEs who recognise the robots crawl delay, I would presume not.

Since I started using it, I notice that the SEs don't seem to be so "grabby' when they come through on a large sweep after an algo update, which is usually 20-30 pages at a time on my primary site now.

I had seen up to 100 pages grabbed in a single pass previously, the bandwidth spikes were ... large ...

Now I mainly get mutiple daily visits of 1,2, upto 10 pages at a time, from the majors and their data centres, so it would appear that if you want "steady drip" rather than "sudden flood", it works.

Hope this is useful.


Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members