Our main site allows business owners to post inventory for sale. This inventory, including internal links, is getting scraped and showing up on crappy sites.
How can I block the scrapers or make it impossible for them to scrape our content?
6:11 pm on May 28, 2014 (gmt 0)
The way I deal with scrapers is to monitor my access logs and find "visitors" that are automated, lookup the IP information and block that server. It is time intensive to build up your information, but it gets much easier after a few months. IPs all pertain to server addresses, but some are ISPs and others are Hosts. Hosts don't send visitors, only robots.
There is a Forum here dedicated to information about SE, Spiders and the Hosts they come from. At the top of that Forum: [webmasterworld.com...] there is information about how to tell robot/automated activity on your site from the activity of human visitors. A lot of information and it takes time to read, understand and take action - but it does solve the problem better than anything else I know.
2:51 pm on May 29, 2014 (gmt 0)
We've been blocking IP addresses but boss now wants to code content so that it is visible on the page but not in the source code of the page so it can't be scraped. By hiding content like this, we are completely screwing ourselves. How can these pages rank if all the content is hidden?
4:22 pm on May 29, 2014 (gmt 0)
Have you thought of using dynamic delivery for those pages so the actual pages are not static files? That is far better than the cloaking scheme that can really be a problem as it specifically goes against Google's Webmaster Guidelines: [support.google.com...]