homepage Welcome to WebmasterWorld Guest from 23.22.173.58
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

    
MSN Crawling Hard
Deindexed site drama...
BillyS




msg:3792184
 8:46 pm on Nov 22, 2008 (gmt 0)

I was just looking through my logs and noticed that msnbot was crawling our site pretty hard, grabbing about 10% of the site in the last half hour or so.

I just checked the site: command on Live and we've only got about 100 pages in their index now - which is fewer than the number of pages mentioned above.

Anyway, we keep thinking about blocking msn altogether and stopping them from wasting bandwidth. I know no one really cares about Live anymore, but I was wondering if anyone else noticed the same - especially if you think you're under some kind of penalty.

 

jpalmer




msg:3796818
 2:29 am on Nov 30, 2008 (gmt 0)

Gidday BillyS

I use a couple of standard robots.txt instructions (can't remember where I got them from - probably here at WebmasterWorld, and then confirmed with SEs robots pages.)

For Google, Yahoo, MSN (presumably also Livesearch), teoma and another obscure SE;

User-agent: botname
Crawl-delay: 10

for MSN specifically;

User-agent: msnbot
Crawl-delay: 10

and general catch-all for anyone else who decides to start being nice:

User-agent: *
Crawl-delay: 15

That's 10 seconds and 15 seconds, obviously you can make it shorter or even longer, just double check the SE protocol.

I don't know if this creates a conflict with the SEs who recognise the robots crawl delay, I would presume not.

Since I started using it, I notice that the SEs don't seem to be so "grabby' when they come through on a large sweep after an algo update, which is usually 20-30 pages at a time on my primary site now.

I had seen up to 100 pages grabbed in a single pass previously, the bandwidth spikes were ... large ...

Now I mainly get mutiple daily visits of 1,2, upto 10 pages at a time, from the majors and their data centres, so it would appear that if you want "steady drip" rather than "sudden flood", it works.

Hope this is useful.
Hooroo
JP

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved