Welcome to WebmasterWorld Guest from 54.166.133.84

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

SEO Impact of Duplicate Content From Tracking IDs in URLS for LARGE SITE

     
10:34 pm on Jun 18, 2016 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 14, 2008
posts: 80
votes: 4


We have a site that currently lists a bunch of unique widgets: www.example.com/directory/001
*note: we are not e-commerce, but more an encyclopedia type site with millions of unique pages - content quality varies from excellent to thin.

Each widget page has a unique url which we refer to in our sitemaps and using the rel=canonical. Example:
www.example.com/directory/001
www.example.com/directory/002
..and etc.

On various pages we add tracking code (e.g. "?tid=trackingID") so that we can track user behavior. For example, we might link to www.example.com/directory/001 on various pages throughout the website as:
www.example.com/directory/001?tid=trackingA
www.example.com/directory/001?tid=trackingB
and etc.

We have verified our rel=canonical is setup correctly on the www.example.com/directory/001 page, and that our sitemaps are correctly referring to the appropriate url (e.g. www.example.com/directory/001).

Recently we noticed that Google has indexed upwards of three variations of the same url, ignoring the rel=canonical and our sitemap. For example, via a site operator query we've identified that the following are all indexed in Google:
www.example.com/directory/001
www.example.com/directory/001?tid=trackingA
www.example.com/directory/001?tid=trackingB

Now, I've read that this isn't a big problem with smaller sites. But for large scale sites (millions of pages), I have reason to believe this is creating problems:

-We've seen a significant (read 30% drop in indexed pages and it's continuing) steady and consistent drop in total overall pages indexed as reported in search console. Now, this might be because our sitemaps link only to the www.example.com/directory/001 urls, and Search Console would show a large de-index of urls since the www.example.com/directory/001?tid=trackingA urls don't exist in our sitemaps. However, we've seen a steady decrease of traffic correlating directly with our decrease in indexed pages as reported in Google WMT/SE.

-My assumption is that Google has limited number of pages they will index for a large website in the millions of pages, and that having multiple variants of the same page just wastes the opportunity for other unique content.

I've done a sizable amount of research and have found a # of articles (quick Google search on 'duplicate content tracking urls' with other keywords will result in a number of articles) with somewhat conflicting information. Many just say that we are canibalizing our SERPS with the multiple url variants being indexed, but as far as we can tell, our SERPS remain consistent for those pages that do get indexed.

My questions:
- Has anyone run into a similar problem where Google ignores the rel=canonical and how have you dealt with this?

- Can you provide any insight into if my assumption might hold weight? Mainly, that these variants of the same url are just taking up size in our 'index appropriation per website' that may or may not exist with Google's index?

- Would Google finding these new urls with the ?tid tracking make Google de-index the existing 'clean/unique/canonical' url for these new ones? If so, then Google is deciding to not index the same number of urls that they are removing.

Really appreciate any insight or further discussion and clarification on this issue.
12:46 am on June 20, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 30, 2008
posts:2630
votes: 191


For example, via a site operator query we've identified that the following are all indexed in Google:
www.example.com/directory/001
www.example.com/directory/001?tid=trackingA
www.example.com/directory/001?tid=trackingB


As of some time ago I have noticed the same as what you are saying - when using site: operator, Google shows various URL variants and not just canonical version, regardless of whether the rel=canonical exists for the page or not.

In fact, even if you have a page that redirects using 301 permanent redirect, if you use site: operator, Google will show it in its index.

I have also noticed that without using site: operator, Google will in most cases show your preferred version, unless you are specifically searching for the URL that redirects / has a canonical to a different URL - in which case what I am seeing is the URL you searched for, with the <title> and meta description of the canonical (or redirect-to) URL, and when selecting "cached" from SERPs, it says "This is Google's cache of www.example.com..." where example.com is the canonical/(redirect-to) URL.

This was not happening in the past although I cannot pinpoint the time when this started. I would imagine that Google always kept these URLs in some kind of index, but has not shown them before in SERPs - we all know that Google never forgets URL and it retries it periodically despite redirect / canonical / 404 / 410.

Would Google finding these new urls with the ?tid tracking make Google de-index the existing 'clean/unique/canonical' url for these new ones?

Not from my experience - although I am not sure whether this would happen if the canonical URL does not have any links (external or internal) and all internal and external links point to URL with tracking parameters - perhaps then it might.

Have you ever seen your URLs with tracking parameters in SERPs when searching for some unique content on the page or do they only show in SERPs when you either use site: operator or when you are searching for an exact URL with tracking parameters? You can also check this in Analytics, looking at Landing Pages from SERPs - if only canonical URLs are shown in SERPs for ordinary searches, then there should be no URLs with tracking displayed when you filter on Google Organic --> Landing pages.
5:19 pm on June 20, 2016 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 14, 2008
posts: 80
votes: 4


@aakk9999 Thanks for the reply! It's good to see that others have noticed Google ignoring the canonical and URLS with tracking IDs being included in Google's Index.

Yes, we have seen that our URLS w/ tracking parameters are showing in SERPS (NOT using the site:operator.). I've verified this with seeing landing pages via google organic with the urls w/ tracking params.

I'm thinking our next step is to remove the tid tracking for the time being to see if we notice any uplift in INDEXed pages via WMT and if this is reflected in organic traffic. I'll make sure to update this thread with our findings, although for a site this size, it's like moving the Titanic and takes a few months to see results in organic. I should know relatively soon if the INDEX is updated in WMT.

Anyone care to add their comments on this subject? I'm very happy to hear about anyone's experience or suggestions.
6:25 pm on June 20, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:2407
votes: 640


Here is a video with John Muller and other Googlers on the specific topic of duplicate content.
[youtube.com...]

I'm thinking our next step is to remove the tid tracking for the time being to see if we notice any uplift in INDEXed pages via WMT

Why do you expect that the number of index pages to go up? If the dupes are indexed and you remove them (the tracking code that is), then the number of indexed pages should drop.

I have a large site, in the millions of pages and only about 20% is indexed by Google. My site is static, no tracking code and every page is unique (data on the pages is unique). I have been watching my indexed pages decline slowly over time as well. However, my traffic has been increasing steadily over the same period. I attribute the decline in indexed pages, to the purging of old pages that no longer exist. I don't think that there is any relationship between the expansion or contraction of indexed pages and your ranking.
6:34 pm on June 20, 2016 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 14, 2008
posts: 80
votes: 4


My hypothesis is that because these tracking urls are not included in our sitemaps, and WMT/SE Console > sitemaps only reflects the number of indexed urls that are submitted in sitemaps, that since the tracking urls don't appear there that eventually the tid tracking urls will be replaced with the correct canonical urls that are reflected in sitemaps. That's a long sentence. I hope the rationale makes sense :)

Thank you very much for your feedback with your own site. Any insight is hugely appreciated.
8:38 pm on June 20, 2016 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 14, 2008
posts: 80
votes: 4


FYI - we did try to use Parameter tools. Perhaps we did it incorrectly?


ParameterURLs monitoredConfiguredEffectCrawl
tid####May 12, 2016NoneRepresentative URL
4:54 pm on June 21, 2016 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 14, 2008
posts:80
votes: 4


Anyone else care to chime in? I know this is a rather obscure issue :)