homepage Welcome to WebmasterWorld Guest from 54.197.130.16
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Yandex bot
Should I block it?
maximillianos




msg:4077040
 4:48 pm on Feb 9, 2010 (gmt 0)

I've been working on a monitoring script that reports on the top bots crawling my site at any given moment. During my testing I've noted a few bots that are legit, but from other countries.

My question, should I block these bots? Yandex for instance is a Russian bot for a Russian search engine/portal. My site is US based, and my traffic is 99% US based. So I'm not getting much benefit from Russian traffic. Does this tell me I should just block them and save my processor time/bandwidth being eaten up by them?

Currently they fall in 3rd in regards to how much they crawl my site only behind Google and Yahoo. We are talking tens of thousands of pages a day they crawl.

What advice would you give me? At first I thought why not just leave it, it is more exposure for my site. But then I started thinking maybe it was pointless?

I guess the same question goes for all those prototype search bots who are trying to make a name for themselves. They typically don't crawl many pages, but they are always on the site.

Thanks for any tips.

 

dstiles




msg:4077169
 8:01 pm on Feb 9, 2010 (gmt 0)

I allow Yandex to crawl my server's sites here (UK). I see the occasional hit from the SE but not enough to worry about. I originally allowed it for a client with a Russian-facing site, now no longer up.

Taking only two sites' stats for the past six weeks, top crawl on both sites is msn, slurp and yandex followed by "unknown" on one site and vagabondo on another. Goog comes in very low on the hits (48 of 600 total and 125 of 1500 total).

I recently ran an excercise tying in yandex crawl IPs with their bot's User-Agent. Very messy. Far fewer actual IPs than MS but fragmented across more Class C's.

I'm working through several other bots at the moment, especially those used by meta engines, ensuring that when everyone drops google there is still something left to send traffic to my customers. :)

maximillianos




msg:4077369
 2:00 am on Feb 10, 2010 (gmt 0)

Yeah I was thinking why not just let them, you never know what traffic they may bring down the road, or some links, or something.

If everyone drops Google, we may have bigger problems on our hands... Microsoft will rule the world! ;-)

dstiles




msg:4077410
 3:48 am on Feb 10, 2010 (gmt 0)

MS - not on my machines. I'm using a meta engine for most things now and about to recommend it to customers. :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved