Forum Moderators: Robert Charlton & goodroi
<meta name="robots" content="noarchive">
User-agent: ia_archiver
Disallow: /
User-agent: googlebot/
User-agent: Mediapartners-Google*
User-agent: slurp
User-agent: msnbot
User-agent: Teoma
Disallow: /cgi-bin
User-agent: *
Disallow: /
I am quite popular in India and Pakistan, and I don't want to be. I feel that showing my pages over there is costing me money, additional competition and lowered ranking.
archive.org respects a noarchive meta tag so you do not have to block the robot as well - although it might save you a bit of bandwidth.
I am surprised that scrapers rank well enough to be competition.
Anyone this obsessed with controlling who may see their website should just open a drive-through window and dispense their information that way.
A better method might be to block using data/services from a major geolocation data provider, such as Quova.
banning based on language settings might be attractive due to its ease of implementation, but it's hardly a good idea.
[edited by: tedster at 7:42 am (utc) on Feb 20, 2010]
Why is language filtering a bad idea ?
In my experience - it did NOT honor the tag, nor the robots.txt, and I had to contact them directly to get out of archive.org.
So new scraping snippets mixed into other scraping snippets becomes new content and some of the other tricks played manage to make them overrun your long tail keywords for a period of time.
[edited by: tedster at 8:30 pm (utc) on Feb 23, 2010]