| 11:26 am on Aug 31, 2009 (gmt 0)|
You need Google to reaccess those pages. Google may be visiting you each day but looking at different parts of your site. I would not be too worried about this and just wait till it cleans itself up naturally.
| 12:26 pm on Sep 1, 2009 (gmt 0)|
You are do it in wrong way, as those pages has been indexed already. The robots.txt just prevents the search engines bots, so it doesn't valid to remove the page. In correct way, the pages should be accessed by search engine bots and then use some tags to tell them:" hi, i am the webmaster of the site. I don't want you to indexed the pages! Please remove them even it has been indexed already."
IMO, there are two ways to archive it.
In some special case(For example, the pages has similar content generated by session id), you maybe are able to use tag rel="canical" in the head of the page you wanna remove to tell SEARCH ENGINEs which is the authoritative page they should be indexed, if the two pages has similar contents.
The seconds one is that use a noindex meta tag. It's simple!
The two ywas resolve the trouble in different aspects. Just remember that you can't use robots.txt to remove the duplicate pages as they have been indexed.
| 6:14 am on Sep 2, 2009 (gmt 0)|
Thanks for the replies. I've read conflicting advice Blan - many say that blocking the URLs in robots.txt WILL also get the already indexed pages removed.
The problem with either of your solutions above is that it would be an extremely labor intensive process as we would have to do that manually for over 500+ URLs. Surely there's an easier way ...
| 12:15 am on Oct 28, 2009 (gmt 0)|
Why cannot you use 301 redirect to the correct URL? In that way you could perhaps use pattern matching?
In that case you should not block these URLs in robots.txt and after some time Google will drop incorrect URLs from its index.
| 12:20 am on Oct 28, 2009 (gmt 0)|
301 redirect to the correct URLs using a pattern-matching directive such as RedirectMatch or RewriteRule (Apache mod_alias and mod_rewrite, respectively) or a script, or...
Return a 410-Gone status, again using either of those pattern-matching directives or a script.
Leave the solution in place for many months, if not years.