Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- Stopping scrapers from the get-go


SevenCubed - 3:52 am on Feb 16, 2011 (gmt 0)


From my recent experiences I can offer some suggestions on point 1 and 2.

The spamhaus blacklist won't be very effective against scrapers. It is primarily intended to be used on your mail server for blocking incoming spam -- for that it is very effective. I use it and can say I like it, spam only trickles in.

I had been stuffing my Linux iptables at the kernel level with bunches of nasty IP ranges like you want to do in htaccess. It finally built up to the point where it began causing serious performance issues. Those same performance issues would be compounded even more so through htaccess because of the constant opening of the file to read it.

It resulted in Google slowing down their crawl frequency which compounded into sites hosted on the server loosing some positions in SERPs. It was a tough decision and trade off but I purged the IPs and left everything wide open again. Server performance skyrocketed and exactly 60 days to the day after opening everything back up Google's crawl rate and frequency went back up and lifted the sites back to where they were previously. In fact one of them went from spot #10 to #2. Coincidence? Up to you to decide but I say slow loading pages became a ranking factor.

All that said I can send you a very effective and comprehensive list of IPs that you can use for your htaccess. It will be effective in keeping away scrapers but I know it will also slow down your site.

If you want them PM me but I won't be able to send them until sometime tomorrow because I don't have them handy with me right here right now.

It might not be as bad depending on how much RAM and CPU you have available. I operate at minimal because I'm just starting out on my own and will scale up to more resources as it becomes necessary.


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4267704.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com