Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Finally realized it was Panda

         

ffctas

4:39 pm on Mar 23, 2015 (gmt 0)

10+ Year Member



We have finally realized our slow drop in traffic over months can be attributed to poor canonical techniques and bad parameter handling. This lead to thousands of near duplicate and thin content (Panda).

We have now implemented proper techniques for handling this content going forward but are unsure of how to handle all of the content already crawled and indexed.

Is there anything we can do do retroactively fix old content? How will Google handle it?

Thank You

Itanium

5:28 pm on Mar 23, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



You can submit a sitemap with the no-index pages in it. But as far as I could see, It takes quite some time for Google to drop no-index pages - despite them being crawled (multiple times!).

I no-indexed nearly 4000 pages a few weeks ago and they are still showing up in the search result. I submitted all the links multiple times in the first few days.

So long story short: You need to be patient, no matter what you do.

not2easy

5:54 pm on Mar 23, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



In GWT you can submit URLs to be removed from the serps. Depending on how many there are, it may be worth the time. I haven't used this for some time, but I believe you can also specify a folder so if all the URLs share something like /category/ or /tags/ that handles a lot of pages in one line. That temporarily will drop them out of search results but any traffic they might be generating will be gone.

At the end of the removal term (I forget whether it is 90 days?) only URLs that are not noindexed will return. In the meantime they will continue to crawl and hopefully begin re-evaluating your position. It is a slow process whether you do this or just wait. A new sitemap can help as Itanium mentioned.

Robert Charlton

6:51 pm on Mar 23, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



You can submit a sitemap with the no-index pages in it.

Not a good idea. Sitemap should contain only currently valid urls.

If you've used mod_rewrite to redirect improperly canonicalized content to canonical form, the incorrect urls will ultimately be removed as Google works its way through its index. It may take a while. Make sure that incorrect urls are also removed from your navigation.

Itanium

8:40 pm on Mar 23, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



A no-index page is still valid and Google has stated various times, that those pages need to be crawlable and not blocked by Robots.txt, so the bot can catch the index-change.

Sending them to Google via sitemaps will make this process a little faster, instead of waiting for Google to crawl some thin unimportant und possibly weakly linked pages (which could take weeks or even months for unimportant pages!).

Robert Charlton

9:06 pm on Mar 23, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I was writing this PS to my post above as Itanium posted... My thought was that we were referring to different types of pages and probably talking past each other... and it turns out that was the case.

I was referring to the changes described in the original post, canonicalizing urls and fixing parameters... and these would have been handled by mod_rewrite. You definitely would not want to include the non-canonical urls in the Sitemap.

Itanium's first response, though, was talking about the use of the meta robots noindex tag (no hyphen), often added to pages to, paradoxically, keep Google from displaying those pages in the index. The meta noindex is often used as a Panda-fix to handle certain kinds of duplicate content, and it does require that Google see and spider the pages (so Googlebot will see the meta robots tag know the pages are to be noindexed). It does make sense that you'd want Google to see those noindex tags quickly. I was not talking about those particular pages.

The use of noindex on this particular site may be a whole other issue. I don't know whether noindex was used or not. I'm often not a fan of using noindex to control many kinds of Panda problems involving thin or shallow or repetitious content. That's a whole other discussion.

But yes, the use of the Sitemap for those pages might speed up Google seeing the noindex tag.

ACFinLA

10:42 pm on Mar 24, 2015 (gmt 0)

10+ Year Member



I recently worked on a site that had a similar problem. Incorrect parameter handling due to lack of canonical tags on many pages led to over 300K duplicate pages indexed. This was over 50% of the site's overall page count. We just added a canonical tag to each page to negate parameters (except for on paginated pages). Within 10 days, our Google indexed page count dropped by 50% and we got an immediate 30%+ increase in traffic. If you can, try to go back and fix your old content, not just pages moving forward.

ffctas

11:12 pm on Mar 24, 2015 (gmt 0)

10+ Year Member



ACfInLA,
Can you be more specific about fixing old content

ACFinLA

11:29 pm on Mar 24, 2015 (gmt 0)

10+ Year Member



ffctas: You should make sure that all of your old pages have the correct canonical tag implementation vs. just pages moving forward. If you have a lot of old pages with parameters and have not added proper canonical tags to all of them, you should do that either dynamically or manually.

The site I referred to only has a few key page templates, so it was easy for our developers to apply the canonical rules at the template level, and it carried across all pages on each template. I believe Google must have noticed this widespread change on our site and crawled/reprocessed the entire site in a short period of time. We didn't do anything to speed up crawling, such as adding old pages to the XML sitemap.

Itanium

1:44 am on Mar 25, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



The fast recrawling might have something to do with the size of the site.

My current experience: I noindexed 4000 (mostly user generated) pages. Exactly one month later the deindexation (briefly) showed up in google search results (via "site:") for the first time. However, it seems not all datacenters have caught up yet. It's still jumping between 24k and 28k indexed pages every now and then. It's still stuck at 28k most of the time. Additionally: Google is crawling the page for 4 days straight with 300 percent the normal rate at the moment.