Forum Moderators: Robert Charlton & goodroi
I've been having problems with a few people in a foreign country visiting my website. They've been searching for products of only one of my manufacturers. Then, they've entered many fake requests for product quotes which tie up my company's resources.
I've concluded that this person (not a bot or spider) is either stealing my website content or my manufacturer's designs. My logs show that they stay on a page an average of 25 seconds and up to a minute, enough to download the entire page content.
I've denied the infringing block of IP addresses from the country in .htaccess and have had no hits from this person since. We've never had an order from this country, so blocking the IP addresses was no problem.
However, this person has since gone to Google cache pages to get my content. I know this from the logs during the time I was banning the IP addresses.
My pages are created on the fly and I can block the IP addresses at the program level. But when Google caches the pages, they are static pages and the program can't deny the page in the Google cache.
I can write a javascript for the block which should work in Google's cache, but they can just disable javascript.
Question. Is there a way that I can program into the html page (that gets cached by search engines) to deny particular IP addresses from accessing the page? Do this without removing my site altogether from the search engine?
Thanks for your thoughts.
Add this to every page in the
head section: <meta name="robots" content="noarchive"> The "Cache" link will disappear gradually as the pages are reindexed. The same meta element will work for Yahoo Search and MSN Live Search too.
allowing people to see your site even when it's down
An edge case, at best - if your site is down often enough to need the Google cache then you need to get better hosting. You can't buy products through the cache. You can't log in to a site through the cache.
You can however copy content through a cache, thus bypassing any IP bans on the site server, you can find removed content long after the site owner has changed it, and you are allowing a third-party to republish your content with their logo at the top.