Forum Moderators: Robert Charlton & goodroi
One of my websites, an online shop, has 3000 real pages, and Google has indexed 12000 pages, a lot of them with jsessionid. I believe this means I have 9000 duplicate pages. A few months ago Google had indexed 45.000 (!) of my pages. Only a dozen of my pages are indexed without the ;jsession-id, I believe those are the pages with some external links pointing to it. It looks like Google gets better in handling duplicates.
I have checked several of my positions in Google for the last 14 months, for keywords where
-I have some traffic and
-I rank within the first three pages of Google.
I noted no big changes in Ranking, before (dec. 2004) and after (aug. 2006) being indexed with jsessionid’s.
The only “bad thing” I notice, is that my pagerank cannot be seen easily anymore. I know I still have Pagerank on my pages, but often not on the original page (the page without jsession-id). I often see the pagerank on the one with session-id. (I checked it on [snip] with Google.com and the site:mydomain.com operator. )
The programmer of the Website says it is really difficult and time consuming ($$) to get rid of the session-id’s and we were also planning on using the session id’s for our own statistics-program.
My questions:
1.Should I worry for my rankings in Google, because I have session-id’s? (I know it is not good for my bandwidth.)
2.Is there a way to show Search Engines how I call my session ID , like in my robots.txt or something? Maybe a tip for Google, Yahoo?
[edited by: pageoneresults at 2:21 pm (utc) on Aug. 28, 2006]
[edit reason]
[1][edit reason] Removed URI Reference - Please Refer to TOS [/edit] [/edit][/1]
Worst case for you would be that the site suffers dup content issues. But IMHO, this situation does not necessarily lead to penalties. Often in situatins where G identifies identical the pages as dups, they simply select one of them for ranking purposes. Unfortunately it may not be the canonical page.
More importantly, even if no dup penalties are involved, the problem remains that you are splitting PageRank across all these dup pages, and that is its own sort of penalty, in the sense that the page will not rank as well when its PR is being divied up.
So, as vincevincevince says, use session id's. And if that is not possible, for whatever reason, another way of handling it is to employ IP delivery to feed the bots canonical pages with proper URL's.
The programmer used the session-id the way he did because he believes it represents the standards the best way. In the manual for Java Servlet Techniques, the ";" sign is used as THE example for writing a session-id in the URL. This is what he wrote / asked me too (I could not give an answer):
"...
This makes the session-id a parameter of the URL's path component as specified in RFC 2396 (http://rfc.net/rfc2396.html) and seems to be in good accordance with the general purpose of the path component, which is specified in Chapter 3.3 as "identifying the [desired] resource within the scope of (...) [a certain] scheme and authority".
On the other hand Chapter 3.4 specifies the optional query component after the "?" as "a string of information to be interpreted by the resource". This component seems to be of less relevance to the document-based context search engines operate on, since document resources are normally not expected to be able to handle user input. So why is one expected by search engine recommendations to place session information there?
..."
Using 'sessionid' or similar as the name of the session variable after? further indicates to search engines its nature. Google has stated 'any arguments containing id' might be assumed to be a session ID and removed. That's what you want - engines to ignore and remove the session ID.
RFC 1630 not only indicates only? to divide the page reference from the queryable part (no other option is provided) but it also specifies that ; should be encoded as "%3B" and should not be used in a URL unencoded.
Related discussion: [webmasterworld.com...]
If zencart can program that, so can your programmers.