| 8:54 pm on Sep 8, 2011 (gmt 0)|
I've got a very parallel situation on one site and I did nothing to try to control indexing (and I'm happy with the results so far - 5 years on.)
If I were going go to do something, I wouldn't bother with the canonical link. I'd use a meta noindex, and open the content area of the questions page with a single big old link that says: "These questions refer to the information at [link]." That's going to circulate link equity in a natural and very effective way.
| 9:38 pm on Sep 8, 2011 (gmt 0)|
If you saw my later post I have some problems with google noindex
I put a month ago <META NAME="robots" content="noindex,follow,noarchive"> and still google index pages that have dup content... sometimes the command site shows me 1040 pages and sometimes 374,
now my site is also penalized (sometimes kws rank 40 and sometimes 72
), I think that is for this reason, so I plan to delete the directory that content this pages and create a new one, blocked and with noindex tag.
What do you think about?
About <META NAME="robots" content="noindex,follow,noarchive"> is correct or need spaces between noindex, follow?
Thanks for your time
| 10:23 pm on Sep 8, 2011 (gmt 0)|
You don't need spaces.
| 12:20 am on Sep 9, 2011 (gmt 0)|
Thanks, so I dont understand what happens... I will delete the directory and copy to a new directory, blocked by robots.txt and with noindex tag.
| 11:19 pm on Sep 9, 2011 (gmt 0)|
If you block with robots.txt Google may still list the pages as URL-only entries.
If you block by robots.txt Google will never see the meta noindex tag, because you blocked them from reading the page.
You need ONLY the meta noindex tag.
| 5:45 am on Sep 22, 2011 (gmt 0)|
You may add more in robots.txt nocache.
| 3:46 pm on Sep 22, 2011 (gmt 0)|
Mike, as far as I know there is nothing in the robots.txt standard that allows for any kind of cache control... only crawl control.