Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Fixed way too many non-canonicals - what next?

         

chewy

2:04 pm on Jul 17, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi,

So working on a project of about 100K pages.

Of course, when I started, Google showed numerous forms of these pages, with and without www, with and without https, with and without extra (testing type) subdomains (all told, there was about half a million pages indexed, so that would be pretty much a 5 time loser.

What would you call that? A Quintuple Duncecap?

In all, it was a total canonical mess. Or as it became to be known as a "dog's breakfast of hairballs within hairballs".

So, rel=canonical issued for www on all pages, all htaccess redirections in place (hooray!), new www flavored xml sitemap in place, all good signals submitted to GWMT, and leading-indicator signals are all slowly trending in the right direction.

We recently went into the backend and search and replaced all non-www to www (and a few other sitewide changes).

Right now, GWMT is only reporting 30,000 canonical pages indexed (this is good as it was only reporting 20K a month ago).

The GWT spider is only spidering about that many pages per day. Always wondered if it was looking at the same pages or different pages and I may go into the logfiles to answer that question. I'll let you know what i find after I come up for air.

But the https and non-www nonsense is still predominantly in the index and still driving traffic.

There are hundreds of thousands of them clearly remaining n the Google Index and there is no indication that this number is falling.

What else can I do to give Google the clear message to send out the robots and de-indexing tentacles to these hundreds of thousands now 301'd pages?

Site has strong backlinks but traffic has been trending down (by about a third) year to year over the last few years, despite lots of new and good content.

With luck, the canonical fixes will turn this mess around.

(if you need a soundtrack, you can add this one: [youtube.com...]

Come on Google - why don't you dance with me?

Are there any additional best practices for de-indexing all those non-canonical pages?

I'm tempted to submit the www-ified sitemap to the non-www instance of GWMT. Good or bad idea?

could do the same for the https instance as well.

or is it a better idea to submit a sitemap to all the 301 pages to the big G ?

always grateful for any WebmasterWorld insights!

aakk9999

8:36 pm on Jul 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How long ago have you done these fixes?

I am asking because with around half a million pages to sort out, it could take quite some time for Google to recrawl them all.
I also noticed that sometimes Google needs to crawl the page several times before it takes the hint and updates its index.

I think that jumping from 20K to 30K of canonical indexed pages is a good sign.

I'm tempted to submit the www-ified sitemap to the non-www instance of GWMT. Good or bad idea?

If you submit http://www version of URLs in the sitemap to the non-www instance and https instance of the site in GWMT, then yes, you could do this. I think it would do no harm and it may help reinforce the message.

tangor

10:08 pm on Jul 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just follow best practice for the size (number of URLS) for each site map submitted. That wold be up to 50,000 urls per site map.