homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque

Webmaster General Forum

Preventing data mining whilst remaining Googlebot friendly

WebmasterWorld Senior Member 10+ Year Member

Msg#: 3422586 posted 9:34 am on Aug 15, 2007 (gmt 0)

Is this the impossible dream? I am nearly ready to launch a site that I know will be like honey to the swarm of data miners out there. There will be pages full of lists of data, mostly geo/geocode data relating to towns, zips, schools, counties, etc

Is there anything I can do to make the site data miner UNfriendly while still keeping the doors open for Googlebot, MSNbot and the like? I know it's a nye on impossible feat but I was thinking along the lines of spider throttling, which the major bots seem to go along with. Even if I throttle back to one page every 10 seconds, I suppose the patient data miners will still get the goodies.

A couple of other ideas I had:

1. Whitelist of user agents. This won't stop the hardcore guys but should be enough to filter out the script kiddies.

2. Few trap pages blocked in robots.txt and strewn around the place to entice the data miners: presuming I don't have GoogleBot banning themselves, which I've heard happen.

any other ways to block the database eaters?


Global Options:
 top home search open messages active posts  

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved