Forum Moderators: open
Now probably the easiest way to solve this problem is to serve pages without session ID to the Googlebot. Only problem: There's no point in denying that this is defacto cloaking.
Can this get me into (severe) trouble? As far as I see, the page content would remain 100% unaltered, but the URL would change. And (more worrying), if Google sent a control bot disguised as, let's say, Mozilla, it wouldn't be able to find the links that Googlebot sees. Google might conclude that someone's messing with their Googlebot, and get VERY, VERY angry...
www.phpbb.com/phpBB/viewtopic.php?t=32328&highlight=spider+google+session
Try those. I've had to "adjust" all my sites so that a session is not started until necessary - so Google can spider without a problem (should it be one, which I'm not sure).
The the very-recent Google update, the deep crawler hit my (new) main site, took the robots.txt, the index, and left. I'm a little worried at the moment.
However, two and a half months ago, the fresh bot spidered the whole site.
*sigh*
[edited by: engine at 1:19 am (utc) on Nov. 4, 2002]
[edit reason] delinked [/edit]
The same thread also recommends to set up a hallway page that lists the topics in a forum. There's an example page posted - it is now a PR0 graveyard...
So I'm a little bit worried about the second approach as well. To me, serving a different page to the Googlebot is a bit like dancing on a volcano. The mere fact that it was fun yesterday doesn't mean that it can't kill you tomorrow. Or am I exaggerating?
A better solution would be to omit session ID's based on IP rather than UA. All search engines run undeclared bots from time to time. If you base your system on IP rather than UA it will work better.
Don't think of it as cloaking. I can't imagine any search engine (even the anti-cloaking zealot, Google)having any problem with that approach. You are delivering the same page to both spider and human. You would be doing them a big favor.
There are legit uses of cloaking, and making urls understandable to bots surely is one of them.