Hi,
So working on a project of about 100K pages.
Of course, when I started, Google showed numerous forms of these pages, with and without www, with and without https, with and without extra (testing type) subdomains (all told, there was about half a million pages indexed, so that would be pretty much a 5 time loser.
What would you call that? A Quintuple Duncecap?
In all, it was a total canonical mess. Or as it became to be known as a "dog's breakfast of hairballs within hairballs".
So, rel=canonical issued for www on all pages, all htaccess redirections in place (hooray!), new www flavored xml sitemap in place, all good signals submitted to GWMT, and leading-indicator signals are all slowly trending in the right direction.
We recently went into the backend and search and replaced all non-www to www (and a few other sitewide changes).
Right now, GWMT is only reporting 30,000 canonical pages indexed (this is good as it was only reporting 20K a month ago).
The GWT spider is only spidering about that many pages per day. Always wondered if it was looking at the same pages or different pages and I may go into the logfiles to answer that question. I'll let you know what i find after I come up for air.
But the https and non-www nonsense is still predominantly in the index and still driving traffic.
There are hundreds of thousands of them clearly remaining n the Google Index and there is no indication that this number is falling.
What else can I do to give Google the clear message to send out the robots and de-indexing tentacles to these hundreds of thousands now 301'd pages?
Site has strong backlinks but traffic has been trending down (by about a third) year to year over the last few years, despite lots of new and good content.
With luck, the canonical fixes will turn this mess around.
(if you need a soundtrack, you can add this one: [
youtube.com...]
Come on Google - why don't you dance with me?
Are there any additional best practices for de-indexing all those non-canonical pages?
I'm tempted to submit the www-ified sitemap to the non-www instance of GWMT. Good or bad idea?
could do the same for the https instance as well.
or is it a better idea to submit a sitemap to all the 301 pages to the big G ?
always grateful for any WebmasterWorld insights!