Welcome to WebmasterWorld Guest from 184.108.40.206
Our company has a fairly complex problem as it relates to indexed pages by Google. Our company has at least 100,000 pages that have been indexed by Google but contain no title and description (I believe this was because they were seen as duplicate pages). These 100,000+ pages were all being linked to from within the website..
Recently, I changed our navigational structure so that these 100,000 pages could not be found by a search engine spider that traversed our website links. But because of our technology, the page still resides on the server, so if Google were to visit these pages directly they would still find the pages.
The problem that I am seeing is two fold:
1. Google is not dropping these 100,000 pages that cannot be found by a traversing spider.
2. Out of the 100,000+ pages, there are a few thousand pages that are being indexed and listed. I actually want to remove these pages from the google index. I've done this by removing the navigational links, but Google seems to visit these pages directly because the pages have recent cache dates.
With the pages not having navigational links to them, will Google eventually drop these pages? Or should I completely remove the pages from the server so that Google can't find them at all?
Our company has created a Google SiteMap for all valid html pages, but we just can't seem to: 1) remove the pages that are getting indexed directly, and: 2) get rid of the pages that have been indexed but have no title and description.
Any thoughts and/or guidance would be greatly appreciated.
If you take them down, you will generate a 404 error, which can either be a temporary situation EG server error, OR permanent. They will eventually be dropped, but will be requested for a period of time before they go any where.
If you would like them gone sooner, you can use the removal tool from Google (if you have a reasonable number of directories they are in) OR you can generate a 410 through your .htaccess, which is GONE.
Generating a 410 requires manual intervention, so SEs usually drop pages faster, because they are being told the situation is not temporary, and the pages have been removed on purpose.
RewriteRule ^yourdirectory/ -[G]
For more on the mod_rewrite, see the Apache Forum [webmasterworld.com]
Hope this helps.