Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Why is it so bad to have session id’s in my URL’s / Duplicate pages?

To me it looks like Google can handle it …

         

Mouwer

8:04 am on Aug 24, 2006 (gmt 0)

10+ Year Member



In December 2004 my (German) site was indexed without session id’s. My website is build in JSP with a Tomcat server. Around 2005 Google began to index my jsession-id’s, for example
www.domain.de/page.htm;jsessionid=4675473CABBCBD5A64C3DE16AE220DDF
(I am mostly concerned about Google because they have a 90% market share in Germany)

One of my websites, an online shop, has 3000 real pages, and Google has indexed 12000 pages, a lot of them with jsessionid. I believe this means I have 9000 duplicate pages. A few months ago Google had indexed 45.000 (!) of my pages. Only a dozen of my pages are indexed without the ;jsession-id, I believe those are the pages with some external links pointing to it. It looks like Google gets better in handling duplicates.

I have checked several of my positions in Google for the last 14 months, for keywords where
-I have some traffic and
-I rank within the first three pages of Google.
I noted no big changes in Ranking, before (dec. 2004) and after (aug. 2006) being indexed with jsessionid’s.

The only “bad thing” I notice, is that my pagerank cannot be seen easily anymore. I know I still have Pagerank on my pages, but often not on the original page (the page without jsession-id). I often see the pagerank on the one with session-id. (I checked it on [snip] with Google.com and the site:mydomain.com operator. )

The programmer of the Website says it is really difficult and time consuming ($$) to get rid of the session-id’s and we were also planning on using the session id’s for our own statistics-program.

My questions:

1.Should I worry for my rankings in Google, because I have session-id’s? (I know it is not good for my bandwidth.)

2.Is there a way to show Search Engines how I call my session ID , like in my robots.txt or something? Maybe a tip for Google, Yahoo?

[edited by: pageoneresults at 2:21 pm (utc) on Aug. 28, 2006]
[edit reason]
[1][edit reason] Removed URI Reference - Please Refer to TOS [/edit]
[/edit][/1]

vincevincevince

8:13 am on Aug 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You should not be using ; in the URL - you should be using?sessionid=..........

Mouwer

8:30 am on Aug 24, 2006 (gmt 0)

10+ Year Member



I asked the programmer a while ago if he could change the ";jsessionid=" to "&id=" because Google mentions in its guidelines it sees this as session-id. I believe the programmer said this was not possible because of the Tomcat server.

caveman

2:11 pm on Aug 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Mouwer, you have a problem and it needs to get fixed. If we built all of our sites according to what programmers said rather than according to what we know about SEO and the search engines, we'd have a lot of poorly ranking sites. :o

Worst case for you would be that the site suffers dup content issues. But IMHO, this situation does not necessarily lead to penalties. Often in situatins where G identifies identical the pages as dups, they simply select one of them for ranking purposes. Unfortunately it may not be the canonical page.

More importantly, even if no dup penalties are involved, the problem remains that you are splitting PageRank across all these dup pages, and that is its own sort of penalty, in the sense that the page will not rank as well when its PR is being divied up.

So, as vincevincevince says, use session id's. And if that is not possible, for whatever reason, another way of handling it is to employ IP delivery to feed the bots canonical pages with proper URL's.

vincevincevince

1:28 am on Aug 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Tomcat doesn't need ; in the URL. And even if your specific implementation does need it, you should be able to rewrite the URLs to change? to ; on-the-fly to satisfy Tomcat.

Mouwer

1:29 pm on Aug 28, 2006 (gmt 0)

10+ Year Member



Thanks for the answers, we decided to strip the session id and only use it when people start using the shopping cart.

The programmer used the session-id the way he did because he believes it represents the standards the best way. In the manual for Java Servlet Techniques, the ";" sign is used as THE example for writing a session-id in the URL. This is what he wrote / asked me too (I could not give an answer):

"...
This makes the session-id a parameter of the URL's path component as specified in RFC 2396 (http://rfc.net/rfc2396.html) and seems to be in good accordance with the general purpose of the path component, which is specified in Chapter 3.3 as "identifying the [desired] resource within the scope of (...) [a certain] scheme and authority".

On the other hand Chapter 3.4 specifies the optional query component after the "?" as "a string of information to be interpreted by the resource". This component seems to be of less relevance to the document-based context search engines operate on, since document resources are normally not expected to be able to handle user input. So why is one expected by search engine recommendations to place session information there?
..."

vincevincevince

2:15 pm on Aug 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It depends, to some extent, on what the session ID is being used for. If the session ID is used to determine what is shown on the page then it would be fine to use ; to separate it. If the session ID is just or substantively to identify the user then it should use? because the session ID itself is not a part of the page but just additional information to be passed to the server.

Using 'sessionid' or similar as the name of the session variable after? further indicates to search engines its nature. Google has stated 'any arguments containing id' might be assumed to be a session ID and removed. That's what you want - engines to ignore and remove the session ID.

RFC 1630 not only indicates only? to divide the page reference from the queryable part (no other option is provided) but it also specifies that ; should be encoded as "%3B" and should not be used in a URL unencoded.

g1smd

7:12 pm on Aug 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Session IDs cause another form of duplicate content. Avoid using them for users that are not logged in; including bots, who can never log in.

Related discussion: [webmasterworld.com...]

trinorthlighting

7:16 pm on Aug 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Those long strings will also possibly make the url go supplemental. There are ways to not allow spiders to start session ids. We use a template from zen cart, and part of the feature is not allowing known spiders from starting session id's. It works very well for google, msn and yahoo.

If zencart can program that, so can your programmers.