Forum Moderators: goodroi
We have since fixed the problem and also updated our robots.txt to tell Google not to index certain URLs.
Will this addition to our robots.txt cause the currently indexed pages that we don't want indexed to be removed by Google automatically?
There are least 500+ pages that we need to get removed and if we have to do them one by one using the URL removal tool in Webmasters tools it will take FOREVER.
I figured adding the strings to our robots.txt would cause the currently indexed pages to be removed, but it's been 3 days and nothing yet. Google spiders our site 1,000s of times per day so I figured they would be removed by now ...
IMO, there are two ways to archive it.
In some special case(For example, the pages has similar content generated by session id), you maybe are able to use tag rel="canical" in the head of the page you wanna remove to tell SEARCH ENGINEs which is the authoritative page they should be indexed, if the two pages has similar contents.
The seconds one is that use a noindex meta tag. It's simple!
The two ywas resolve the trouble in different aspects. Just remember that you can't use robots.txt to remove the duplicate pages as they have been indexed.
The problem with either of your solutions above is that it would be an extremely labor intensive process as we would have to do that manually for over 500+ URLs. Surely there's an easier way ...
Leave the solution in place for many months, if not years.
Jim