|How does Google handle session IDs?|
how Google handles or recognizes session id
| 6:08 am on Aug 10, 2003 (gmt 0)|
I've been reading both here and also on Google's own webmaster pages, and I find myself uncertain about how Google handles session IDs.
I understand that if a page requires a session ID to be viewed, then Google will not index it. But, how does Google do this? Does it try using the session ID to view the page, or does it omit the session ID and try to view the page?
How would Google handle the following type of URLs:
In the above example, the ID represents what content to serve and not a session ID.
In the above example, the ID represents a session ID and the content represents what content to serve. If the ID is omitted, the same content will be served.
And, if a session ID does not have "id" anywhere in its parameter, how does Google determine whether or not it is a session ID?
I am actually asking this on behalf of a number of webmasters. We all use the same web server ecommerce package. It utilizes path arguments and a session ID to display pages dynamically. The session IDs are used to keep track of shopping basket contents and, unless the webmaster has changed the default action, the same content will be displayed even without the session ID. Without the session ID path argument, the only thing lost would be shopping basket contents, which, of course, a robot or spider wouldn't need.
The session ID might also be used with static pages, so the only path argument in those cases would be the session ID. We're just wondering how the session ID affects the indexing of our dynamic and static sites--all of which would use a session ID path argument.
| 4:40 pm on Aug 11, 2003 (gmt 0)|
|I understand that if a page requires a session ID to be viewed, then Google will not index it. |
Outdated information: Google gobbles mine up.
|The session ID might also be used with static pages, so the only path argument in those cases would be the session ID. |
Not a problem, crawls it like a champ from a static page.
|And, if a session ID does not have "id" anywhere in its parameter, how does Google determine whether or not it is a session ID? |
Actually, I've heard not to use "id=" but to instead use "anything=". I've used both without issue. Google looks for "?", etc.
Keep your strings as short and as simple as possible. Googlebot's just learning the ropes. :)
| 5:52 pm on Aug 11, 2003 (gmt 0)|
|Outdated information: Google gobbles mine up. |
Oh really...when did this change? I know a number of people whose sites pass sessions via the URL and Googlebot will only index their homepage and those without sessions. Some of the other spiders will index pages with sessions without hesitation but very frequently they end up getting lost.
Re: ccDan: It is not wise to pass sessions with the URL to any bot. If you can, modify the script so it doesn't serve those to non-humans. Some bots will index your pages with it but often times they will get lost calling up the same pages over and over again when the id changes because they often confuse this as being a new page. They can eat up a ton of your bandwidth when it happens.
It's pretty easy to tell a session because it's always 32 alphanumeric characters in length and will look similar to this: 954d8226d96ea410c6e428395e66388d . Some bots will look for certain words like ID to give them a clue too.
| 6:23 pm on Aug 11, 2003 (gmt 0)|
I work on a portal that had to abolish its session IDs because Google indexed the same page over and over again. We didn't want to risk being penalised for duplicate content so we changed to cookies just in case. I wrote to Google about it but never got any confirmation whether session IDs were bad or not.
When it comes to your first example (http://www.domain.dom/article.html?id=12345), we had exactly the same setup for our articles. Not a single one got indexed but when we changed "id" to "article" every page was picked up.
HTH a little :)
| 7:30 pm on Aug 11, 2003 (gmt 0)|
|Not a single one got indexed but when we changed "id" to "article" every page was picked up. |
That may just be a coincidence. I've had plenty of id= pages indexed.
| 10:24 pm on Aug 11, 2003 (gmt 0)|
|I work on a portal that had to abolish its session IDs because Google indexed the same page over and over again. |
This is the problem with SID's - google will index them just fine, but each URL is individual, even for the same page, if google comes in via multiple bots.
So you can end up with literally thousands of duplicates in the index (this happened to us, at one time we had over 70,000 pages of our site indexed, and the site consisted at the time of about 40 pages total ;-))
It doesn't harm anything - we didn't get banned, google just ignored the duplicates. But it sure was a waste of googles HD space (it cached every one of them lol) and also a major waste of our bandwidth (google was responsible for about 3 gig over about 10 days).
| 5:25 am on Aug 13, 2003 (gmt 0)|
Google guidelines say
|Allow search bots to crawl your sites without session ID's or arguments that track their path through the site |
|If you decide to use dynamic pages... It helps to keep the parameters short and the number of them small. |
try a google search for:
inurl:".php?SESSID=" an you will find loads of pages like this
There are so many of these that it looks to me like google has extracted the session id before following the link (though it is not true for evrey case). I agree with BlueSky - it is usually pretty easy to see a session id in the URL, regardless of the name - I see no 32 alnum sesion IDs in the above search BTW.
brings up loads of these.....
there are 32 alnum session IDs in this search though...
I think it is best to disable session IDs for search engine spiders... why not? saves bandwidth and I am still scared about duplicate content.
|we didn't get banned, google just ignored the duplicates |
trillianjedi - How did those pages do in the rankings? Maybe you didn't get banned, but possibly the pages didn't rank well.