Forum Moderators: DixonJones

Message Too Old, No Replies

Indexing Session ID's

         

Fortune

3:23 pm on Aug 4, 2005 (gmt 0)

10+ Year Member



Hi all,

One of my clients have the following problems with all their sites.

Firstly, a brief background:

- The sites are +/- 5 years old.
- They are built on standard Microsoft ASP.
- The sites make use of session tracking in order to track user behavior and in order to track marketing campaigns for their affiliates and publishers.

I know from Google Guidelines, they say stay away from session tracking. Obviously it's a bit late now but I’m trying to find some way to rectify this problem.

Here is a full practical example of my problem:

Site: www.siteurl.com (not real URL for client confidentiality)

If I do a search for indexed pages of the URL in Google, returned results look similar to these:

www.siteurl.com/default.asp?btag=campaign_id_300
www.siteurl.com/products.asp?btag=campaign_id_412

(Essentially, the URLS above would appear on their affiliates sites (unique tracking code so they are paid for their sales).

As you can see, the above URL's include the query string for those pages. If I now search for just the URL's with no query strings i.e. www.siteurl.com/default.asp, Google says they can’t find those pages. I suspect they indexed the pages with the query string first, then indexed the original page without the query string, looked at both pages, found a duplicate and threw one of them away. In my case, majority of the main URL’s were removed

Now,

I have asked a few people around including Google, how I stop them from indexing pages with sessions ID's. Google tells me I need to exclude these pages in the Robots.txt file but because it’s an ASP site it makes use of a Global.asp file that sets requirements for the sessions tracking. This Global.asp file is not a directory so I can’t exclude it from being indexed (It’s a server side file).

Im thinking a solution could be the following:
In the Global.asp file, create an IF, THEN, ELSE statement that says something about If Googlebot. THEN don’t index sessions ELSE index the rest of the pages…

I’m not sure if the above sounds a little confusing to you all but I am wondering if anybody has any suggestions….

Thanks All…

ronburk

4:59 pm on Aug 5, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google tells me I need to exclude these pages in the Robots.txt

Heh -- I bet they didn't tell Amazon to exclude their entire website (all pages of which use session IDs)!

When session IDs are implemented via URL rewriting (as they must be to work across all browser/configurations), I know of no other solution than to detect SE robots and dynamically turn off session IDs for those clients. This is the canonical instance in which "cloaking" is both approved of by Google and quite necessary.

There is, unfortunately, no standard for clients to indicate that they are robots that need session IDs turned off in order to correctly index URLs. So, you have to hard-code your detection of SE robots, and when new robots come along they are out of luck unless/until you notice and add them to your list.

Fortune

2:41 pm on Aug 8, 2005 (gmt 0)

10+ Year Member



Ronburk

Are you saying that my suggestion about creating an IF, THEN, ELSE statement in my Global.asp file is the correct route to take?

Thanks

ronburk

5:32 pm on Aug 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes. If it's a known search engine crawler, disable your URL-based session IDs.