Forum Moderators: phranque

Message Too Old, No Replies

Doing as much as possible to prevent site scraping

         

esllou

1:30 pm on Apr 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I am soon going to be opening a site which will have a lot of useful address/geocode database data. It's exactly the sort of site which will become the target of site scrapers.

What is the best way of tackling this problem and making it as hard as possible for our site to be scraped?

With movable User Agents, the old "anti-bot htaccess list" is looking evermore out of date.

Is there a simple way of throttling page requests from each individual IP number in a way which wouldn't impact human traffic but would make bot scraping more trouble than it's worth?

mikea

2:47 pm on Apr 9, 2007 (gmt 0)

10+ Year Member



You might want to look at <snip> when it launches. Disclaimer: I run the site.

Cheers,
Mike

[edited by: jdMorgan at 4:34 pm (utc) on April 9, 2007]
[edit reason] No URL-drops, please. See TOS. [/edit]

jdMorgan

4:38 pm on Apr 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You might want to take a look at Key_Master's bad-bot script [webmasterworld.com], or xlcus' and AlexK's badly-behaved bot script [webmasterworld.com] in our PERL and PHP forum libraries, respectively.

Jim