Welcome to WebmasterWorld Guest from 35.172.111.215

Forum Moderators: open

Message Too Old, No Replies

Google and Session Killing

Is this safe for Google?

     
10:38 am on Nov 28, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 22, 2002
posts:959
votes: 0


I've started to tinker with the amazing Oscommerce open source shopping basket system and noticed a problem - sessions. It's already been established that sessions can cause a duplicate problem in Google so some clever clogs have come up with a way to kill the session ID when it detects a search engine bot using this bit of code:

// Add more Spiders as you find them 
$spiders = array("Googlebot","WebCrawler","Other Engines etc etc");
$spider_count = 0;
foreach($spiders as $Val) {
if (eregi($Val, getenv("HTTP_USER_AGENT"))) {
$spider_count++;
}
}
if ($spider_count!= "0") {
// Edit out one of these as necessary depending upon your version of html_output.php
$sess = NULL;
// $sid = NULL;
}

Now - is this safe for Google? Will the Googlebot think it's a site trying to cloak? I know Google sends out different IP's to hunt for cloaked sites and the last thing I want to do is get a site banned for cloaking... *shakes*

7:40 am on Dec 1, 2002 (gmt 0)

New User

10+ Year Member

joined:July 18, 2002
posts:27
votes: 0


Hi nutsandbolts,

I did nearly the same thing on a site some month ago, and immediatly got the dreaded PR0 for it. (I really changed nothing else).
I agree it isn't cloaking, but googlebot will have some difficulty seeing the difference.
Now I am simply using cookies for session tracking, and if a user does not accept them (like googlebot), the fallback method of appending them as get-parameter is only used when somebody actually puts something in his cart - which will never happen in googlebots case. With this method, there is no risk of getting punished for cloaking.

10:29 am on Dec 1, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 26, 2002
posts:173
votes: 0


@brotherhood of LAN: This will simply turn off session-support for non-cookie-users completely. Also this will only work when you are actually using the sessioncode of PHP. A lot of software, especially those written for backwards-compatibility with PHP3 usually bring their own sessioncode. [php.net...]

@hyperion: Could it be that your ban is resulting from something else? As I have written I have multiple sites using that without problems with Google (and one site's PR went up to 6 this month).

1:26 pm on Dec 1, 2002 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member brotherhood_of_lan is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 30, 2002
posts:5046
votes: 60


This will simply turn off session-support for non-cookie-users completely.

I'm more at a loss each time with this. Here's a quote from the same php page you quote...I was reading it earlier

URL based session management has additional security risks compared to cookie based session management. Users may send an URL that contains an active session ID to their friends by email or users may save an URL that contains a session ID to their bookmarks and access your site with the same session ID always, for example.

So not all browsers accept cookies, and using URL's and having the sessionid in the URL poses a problem. It's making me wonder how it can be done....google aside :)

8:32 pm on Dec 1, 2002 (gmt 0)

New User

10+ Year Member

joined:July 18, 2002
posts:27
votes: 0


@ruserious,

no, it was the only change I made for six months, and there are no links to other sites, so bad neighbourhood cannot have been the cause, either.
And when I dumped the change, I got back into google with the next update. Maybe it works if you use the PHP-session handling functions, so that googlebot knows only sessid is missing, but because I use my own set of session functions, I had a get-paramter with a different name.
But I wouldn't try again ;-)...

8:45 pm on Dec 1, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 22, 2002
posts:959
votes: 0


Thanks for the replies so far - well, I think I will leave things as they are then.... I'm just not confident enough that it's a safe way to do things especially after my near-12 month absence from Google!
11:42 pm on Dec 3, 2002 (gmt 0)

New User

10+ Year Member

joined:Dec 3, 2002
posts:3
votes: 0


Hi. That script in no way "redirects" or "cloaks" the site. It simply does not produce a Session ID if one of the "spiders" in the array is visiting.

I am successfully using that exact script (as I am the "clever clogs" ;) that wrote it) across a number of Oscommerce sites that I admin.

Initial results:

270 products in the database of the main site I am tracking. Before adding the script *no* product pages were listed.

Since adding in the script, in the update over the past few days, there are now 257 products listed, all without SID...

If anyone can defaintely tell me that this is a harmful script, then I'll be glad to listen and make amends to it, but as of now, the proof is here. No products before the script was introduced, 257 of 270 listed after the script was added....

Would definately appreciate any comments! Thanks.

12:14 am on Dec 4, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 26, 2000
posts:2176
votes: 0


I did nearly the same thing on a site some month ago, and immediatly got the dreaded PR0 for it. (I really changed nothing else).
I agree it isn't cloaking, but googlebot will have some difficulty seeing the difference.

Googlebot doesn't "see" anything. It just retrieves links and stores data. If you were really only giving googlebot urls without a session id, there isn't any kind of automated way that googlebot can give you a penalty.

Loosing PR for a crawl cycle or two is fairly common. The fact that it happens doesn't mean you've been penalized.

No search engine has any legitimate reason to demand that sites that wish to be indexed must give up the right to track humans who use browsers with cookies disabled.

Any site that is serious about session tracking should setup a system that only excludes. Otherwise, you are giving up a significant amount of data.

Humans with cookies enabled get a cookie.

Humans with cookies disabled get a session id added to the url.

Spiders get neither.

IP/UA detction is the proper way to make that system work. And using such a system is no different than the geo targeting systems used by all search engines.

4:44 pm on Dec 4, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 18, 2002
posts:69
votes: 0


I am just setting up a new Shop System, that uses a Session ID in the URL.

From this thread I have come to the conclusion, that using a similar script as provided by burt_online will be sufficient to enable spiders to index the product catalog.

Is this Script "universal" or will it only work for osCommerce? The Problem with ocCommerce is, that it doesn't feature any synchro options for ERP Software.

5:02 pm on Dec 4, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 8, 2001
posts:2882
votes: 0


Everybody knows I'm pretty anti-cloaking, but WebGuerilla has already made strong points why it's okay to drop a session ID for Google. burt_online, I'm really glad that your products got crawled a lot more--sounds like a win for your site (more pages indexed) and for Google (better coverage of useful pages), and that adds up to a better experience for searchers.

This is just my personal take, but allowing Googlebot to crawl without requiring session ID's should not run afoul of Google's policy against cloaking. I encourage webmasters to drop session ID's when they can. I would consider it safe. Fair enough?

Hope that helps,
GoogleGuy

5:08 pm on Dec 4, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 22, 2001
posts:3805
votes: 2


Thanks for the clarification, GoogleGuy.
This 25 message thread spans 3 pages: 25