phranque

msg:4381456 | 1:18 am on Oct 31, 2011 (gmt 0) |
robots.txt will exclude the bot from crawling the content but won't prevent google from including those urls in the index. depending on the details of your situation, the better solution may be one of the following: - redirect those requests to the canonical urls - meta robots noindex the documents served from non-canonical urls - use a link rel canonical element - use the ignore parameters feature in GWT if appropriate then you should look for where google discovered those non-canonical urls and if that situation is under your control you should fix it at the source.
|
joyjesters

msg:4381625 | 2:59 pm on Oct 31, 2011 (gmt 0) |
Thanks. I also did a submission to get rid of them via Webmaster Tools. Looks like they already got rid of them :)
|
dstiles

msg:4382337 | 11:12 pm on Nov 1, 2011 (gmt 0) |
My advice would be to look at the Search Engine Spider and User Agent Identification forum and the Google SEO News and Discussion forum hereabouts. The former has a lot of info about killing scrapers and hackers, the latter will tell you google has basically lost the plot. :)
|
|