incrediBILL - 3:27 am on Apr 11, 2011 (gmt 0)
if you want to find out where your articles go and who took them put some poison pills in your text.
You can hide some stuff in plain sight, bots see it but humans don't, fun CSS tricks. I cloak that stuff out only when I know it's not Google, Yahoo, Bing, etc. so it's supposedly humans doing the crawling. However, if you don't have those skills, just throw in a gibberish word per page like "aardvarkapalooza" as a single word hidden from viewers in CSS but it'll show up all over the place on scrapers pages.
Now if you got any coding skills, add an integer version of an IP address to the end as well, so you put a code in your text like "zzxxyyqqzz-2130706433" which kind of looks like a product part number or something. I convert the IP to a single # because the scripts that grind your code up will use the periods in a IP as a break and spin the IP into 4 parts. I want to track these idiots down, so I make sure it survives in 1 part.
Once they scrape and it gets indexed you simply search for "zzxxyyqqzz", or whatever your unique code is that didn't exist in Google before, and VOILA! they pop up like radioactive tagged rats in a sewer.
The integer IP 2130706433 decodes to 127.0.0.1, simple math really, and PHP provides ip2long() and long2ip() functions [php.net] to speed you on your way.
Then I scan my logs for that IP, get the user agent as well.
Now I have full trip details proving the idiot scraper scraped my site, hello, ISP, you have a AUP violator, here's my log files, here's the poison pills on his page, hurt him please.