Page is a not externally linkable
europeforvisitors - 2:46 pm on May 4, 2006 (gmt 0)
But don't they need to index the duplicate content to know it's duplicate content? They get rid of duplicate content by identifying and filtering the duplicated pages from search results--not by removing the data from their hard drives.
So, Google has a storage problem eh? Could it be because they are out indexing every single thing they can get their hands on? Could it be that 30-40% of their index is duplicate content in one form or another? How about cleaning up the index first and then worry about increased storage