Forum Moderators: open
How do you mean, "IM client"?
User-agent: Googlebot
Disallow: /images/
Allow: /images/adwords-bkgrnd
Disallow: /cgi-bin
User-agent: Slurp
User-agent: msnbot
User-agent: Teoma
Disallow: /cgi-bin
Disallow: /images/
Crawl-delay: 15
Sitemap: http://www.example.com/sitemap.xml
User-agent: *
Disallow: /
Since our crawler is distributed, it may take up to 3 hours for this change to take affect. After you add the disallow for our bot, we may check the robots file occasionally--at most every 3 hours, but it's highly unlikely that rate would be sustained for long periods of time.
Host: 206.193.198.90
/robots.txt
Http Code: 200 Date: May 11 14:25:58 Http Version: HTTP/1.1 Size in Bytes: 2592
Referer: -
Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 ( .NET CLR 3.5.30729)
/favicon.ico
Http Code: 200 Date: May 11 14:25:59 Http Version: HTTP/1.1 Size in Bytes: 3638
Referer: -
Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 ( .NET CLR 3.5.30729)
Your robots.txt file lists several User-agents per disallow statement. We don't currently support that format (we will add it to our parser)
Many robots cannot handle multiple-user-agent policy records. That's a pity, since it's part of the original Standard for Robot Exclusion [robotstxt.org].
FYI, the reason the robots.txt block didn't work is because we currently don't support the format you provided in your robots.txt. The standard format is single entry per user-agent, rather than grouping them, as you had done.
User-agent: 008
User-agent: AboutUsBot
User-agent: Baiduspider+
User-agent: Becomebot
User-agent: FollowSite Bot
User-agent: sitecheck.internetseer.com
User-agent: OmniExplorer_Bot
User-agent: RB2B-bot
User-agent: SBIder
User-agent: StackRambler
User-agent: TurnitinBot
User-agent: Yandex
Disallow: /
We webmasters need to see what tangible benefit, as in traffic driven to our sites, your bot produces. Show that and you'll have a passel of supporters. Fail, there's no love, and some of these folks can get downright cantankerous. Seriously, show the benefit to us, the webmaster, first and not your clients who do nothing but scrape our content and run up our bandwidth. Do that and you'll have friends in the biz.
The non-paying customers are severely limited in the amount of crawling they can do with us.
Sine our efforts have been a disappointment, we'll be disengaging from this community.
Most of them make it very easy to ignore robots.txt, spoof IPs and so on.