Page is a not externally linkable
- WebmasterWorld
-- Content, Writing and Copyright
---- How to prevent scraper sites . . .


jdMorgan - 2:59 am on May 18, 2005 (gmt 0)


The really greedy ones can be easy to prevent. Some common methods come to mind:

  • Manually block well-known download tools by user-agent and/or IP address (black-list).
  • Allow only known-good user-agents to access your site (white-list).
  • Automatically block all-at-once page requests (block based on access rate).
  • Automatically block visitors that disobey robots.txt (traps, honeypots).
  • Block well-known open proxies; Detect and screen proxied requests.

    In addition, some Webmasters may block entire class A networks because their site will not benefit from allowing access from those IP address ranges. For example, some sites may not benefit from out-of-country traffic at all, but might suffer site-scraping from other countries. I'm not suggesting or condemning such a wide-reaching access restriction, just mentioning it; The choice is up to the individual Webmaster.

    Many of these solutions have been discussed in the technical/scripting forums here. There are also many Web sites about IP address and open-proxy blacklisting. These methods can help to reduce the number of successful scrapes, and thereby reduce your legal costs, time spent on DMCA filings, worry, etc. None are foolproof, but they can discourage those who might take your site for easy prey.

    Jim


    Thread source:: http://www.webmasterworld.com/content_copywriting/1341.htm
    Brought to you by WebmasterWorld: http://www.webmasterworld.com