SirTox - 1:10 am on Aug 7, 2013 (gmt 0)
You have to be proactive and block bots so they don't scrape the site in the first place instead of wailing about it when your content is spread all over the web like cheese on a cracker.
1. How can you block all of the bots? None of them honor robots.txt.
2. It's impossible to block all IP addresses from scraping.
3. Not all scrapers use bots.
I start with a kind copyright violation warning that I email to the owner of the site. Most of the time, this works.
If that fails, I file a DMCA with anything I can find. It has to be really close to your content or an exact copy for many hosts and Google to consider a takedown.
The danger here is that if you go after some jerk in India or Nigeria, sometimes they have tons of time to make your life hell. I've had retaliation DMCAs filed on my Adsense account as well as to Google. There's little you can do unless you want to spend thousands of dollars battling somebody in the courts who is thousands of miles away.
If the site owner has attitude, I find content on their site which has been stolen from sites other than my own. I can often find content big name blogs like Mashable or Gigaom. I then shoot a message to those sites pointing out that their content has been stolen. These bigger sites usually take action immediately. Google listens to them too.
I took this approach with this person who was making hell for me. Eventually it forced him into switching hosts. He promptly removed the content he stole from me as well.
Another thing, block Nigeria and India too via htaccess. Doing so will block 80% of your scraping issues. It's not worth the traffic. It's true there is a way around it, but it makes it way to difficult for most content stealers to bother with your site when they have so many other targets.