Forum Moderators: Robert Charlton & goodroi
I have found a very disturbing problem with my site and I hope my experience will help others suffering from this problem.
It all started a year ago when we decided to use tracking for our AdSense ads. We used simple tracking variable such as [mysite.com...]
Soon after doing that Google started crawling these pages and had a cached version of the pages with the tracking variable. We were afraid that it will cause us to suffer from duplicate filter so we posted a question on Google groups and got an answer that it will not cause any problems and that Google is smart enough to not count these pages as different from the original page.
Guess what? All original pages were lost in serps, but we did not think it had anything to do with the pages that had the tracking variables so we were looking elsewhere for the problems.
One week ago I found out that the pages with the tracking variables in the URL have PageRank and are crawled more often then the original pages. in fact one page which was changed recently has pagerank for both versions and the cache is newer for the page with the tracking variables.
I immediately 301 all pages to the originals and now wait to see what will be the effect. I hope it will solve the problem and will let you know what happened.
You have two versions of the pages one is a clear .htm, another with the tracking code and the two versions has the same output/content.
I wonder how on earth someone on google groups would tell you that it's ok since the two versions are accessible for googlebots. You are lucky enough to only lose the original pages from the index and not triggering a duplicate contents penalty.
And yes it's highly likely that 301ing the pages would solve the problem.
But basically you got it right.
Only that there is no second version as it is just a tracking variable I used to monitor my AdWords campaigns.
The link with the tracking variable was used only on my AdWords ads therefore we assumed that Google will not follow these links and once we got an approval for this theory from Google groups we considered it as a fact.
On Apache server you can put what ever you want after the "?" and it is not considered as a part of the URL, I was amazed to discover that Google crawls AdWords links and that they consider the tracking URL as a page on its own.