Forum Moderators: Robert Charlton & goodroi
One of my clients have the following problems with all their sites.
Firstly, a brief background:
- The sites are +/- 5 years old.
- They are built on standard Microsoft ASP.
- The sites make use of session tracking in order to track user behavior and in order to track marketing campaigns for their affiliates and publishers.
I know from Google Guidelines, they say stay away from session tracking. Obviously it's a bit late now but I’m trying to find some way to rectify this problem.
Here is a full practical example of my problem:
Site: www.siteurl.com (not real URL for client confidentiality)
If I do a search for indexed pages of the URL in Google, returned results look similar to these:
www.siteurl.com/default.asp?btag=campaign_id_300
www.siteurl.com/products.asp?btag=campaign_id_412
(Essentially, the URLS above would appear on their affiliates sites (unique tracking code so they are paid for their sales).
As you can see, the above URL's include the query string for those pages. If I now search for just the URL's with no query strings i.e. www.siteurl.com/default.asp, Google says they can’t find those pages. I suspect they indexed the pages with the query string first, then indexed the original page without the query string, looked at both pages, found a duplicate and threw one of them away. In my case, majority of the main URL’s were removed
Now,
I have asked a few people around including Google, how I stop them from indexing pages with sessions ID's. Google tells me I need to exclude these pages in the Robots.txt file but because it’s an ASP site it makes use of a Global.asp file that sets requirements for the sessions tracking. This Global.asp file is not a directory so I can’t exclude it from being indexed (It’s a server side file).
Im thinking a solution could be the following:
In the Global.asp file, create an IF, THEN, ELSE statement that says something about If Googlebot. THEN don’t index sessions ELSE index the rest of the pages…
I’m not sure if the above sounds a little confusing to you all but I am wondering if anybody has any suggestions….
Thanks All…
An alternative would be to use www.siteurl.com/products.asp#btag=campaign_id_412 because all after the # is thrown away by search engines.
I also always add <meta name="robots" content="noindex"> to all "print friendly" versions of pages in order to avoid duplicate content, and to avoid a searcher arriving directly at a page that immediately tries to print itself.