homepage Welcome to WebmasterWorld Guest from 54.237.38.30
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
Google and Session Killing
Is this safe for Google?
nutsandbolts

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7239 posted 10:38 am on Nov 28, 2002 (gmt 0)

I've started to tinker with the amazing Oscommerce open source shopping basket system and noticed a problem - sessions. It's already been established that sessions can cause a duplicate problem in Google so some clever clogs have come up with a way to kill the session ID when it detects a search engine bot using this bit of code:

// Add more Spiders as you find them
$spiders = array("Googlebot","WebCrawler","Other Engines etc etc");
$spider_count = 0;
foreach($spiders as $Val) {
if (eregi($Val, getenv("HTTP_USER_AGENT"))) {
$spider_count++;
}
}
if ($spider_count!= "0") {
// Edit out one of these as necessary depending upon your version of html_output.php
$sess = NULL;
// $sid = NULL;
}

Now - is this safe for Google? Will the Googlebot think it's a site trying to cloak? I know Google sends out different IP's to hunt for cloaked sites and the last thing I want to do is get a site banned for cloaking... *shakes*

 

ruserious

10+ Year Member



 
Msg#: 7239 posted 1:22 pm on Nov 28, 2002 (gmt 0)

[google.com...]

"Allow search bots to crawl your sites without session ID's or arguments that track their path through the site."

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 7239 posted 1:32 pm on Nov 28, 2002 (gmt 0)

[google.com...]

"Don't employ cloaking or sneaky redirects. "

brotherhood of LAN

WebmasterWorld Administrator brotherhood_of_lan us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 7239 posted 1:34 pm on Nov 28, 2002 (gmt 0)

Surely it only matters if you pass the sessionid along in the URL?

I'm playing around making a cart, where the php "session id" is needed for authentication.

It's done so that only logged in visitors see the string in the URL, which allows tracking/authentication of them, otherwise its a plain jane page served to n.e.one.

I'm not exactly savvy with PHP/whatever but unless the sessionid is part of the URL it shouldnt matter, even if you kill the session and change to a new one?

nutsandbolts

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7239 posted 1:38 pm on Nov 28, 2002 (gmt 0)

It's done so that only logged in visitors see the string in the URL

Well, Oscommerce puts a session ID for visitors too I.E - Googlebot....

ruserious

10+ Year Member



 
Msg#: 7239 posted 1:50 pm on Nov 28, 2002 (gmt 0)

@Brett: As a matter of fact I am doing that on my sites for several months and have not had any problems with it for google. Maybe should have mentioned that. ;)

I do not see any harm done to either search results, users, google or whoever, so I do not think google thinks this is "bad".

nutsandbolts

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7239 posted 10:39 pm on Nov 29, 2002 (gmt 0)

So, is this safe for Google or not? Worth the risk?

WebGuerrilla

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7239 posted 9:43 am on Nov 30, 2002 (gmt 0)


There isn't any risk.
Even the most anti-seo member of the Google spam squad would have a tough time arguing why checking IP's and/or UA's in order to make sure you don't give googlebot urls that will guarantee an endless loop of duplicate content is the act of a spammer.

You aren't giving the search engine different content than what a human sees. You are just serving the content from a different URL. That in no way causes harm to their index. All it does is help them out by making sure they don't waste bandwidth on your site.

Bernie

10+ Year Member



 
Msg#: 7239 posted 11:03 am on Nov 30, 2002 (gmt 0)

...from the definiton of cloaking (serve different content to human and bot) it is not cloaking - so no risk.

the problem i would check though is: does the bot receive exactly the same amount of data in bytes if a session id is handed over in one case and not handed over in another case?

i am not tecky enough to answer this question but i would assume that at least the total amount of data transferred between server and client is different in the two cases.

the question is: how would a decloaking-sider find out that a server delivers the same source code nomatter what is the user_agent or ip asking for that page?

brotherhood of LAN

WebmasterWorld Administrator brotherhood_of_lan us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 7239 posted 3:45 am on Dec 1, 2002 (gmt 0)

nutsandbolts,

I've managed to mess up my PHP config this week messing around with these php session ids and how they work, so this might not work ;)

If you have access to php.ini, change this line to make sure it is 0.
session.use_trans_sid = 1

AFAIK that puts the session ID automatically into URL's. If you change it to 0 the sessionid's are not in the URL's so I'd think Google wouldnt care whatever session it was because the URL would be the same.

I've not much of an idea how the whole shebang goes but tinkering with the php.ini session settings might make the problem a bit easier.

Worth a tinker or a look at maybe, though you'd probably have to start altering the commerce script aswell.

hyperion

10+ Year Member



 
Msg#: 7239 posted 7:40 am on Dec 1, 2002 (gmt 0)

Hi nutsandbolts,

I did nearly the same thing on a site some month ago, and immediatly got the dreaded PR0 for it. (I really changed nothing else).
I agree it isn't cloaking, but googlebot will have some difficulty seeing the difference.
Now I am simply using cookies for session tracking, and if a user does not accept them (like googlebot), the fallback method of appending them as get-parameter is only used when somebody actually puts something in his cart - which will never happen in googlebots case. With this method, there is no risk of getting punished for cloaking.

ruserious

10+ Year Member



 
Msg#: 7239 posted 10:29 am on Dec 1, 2002 (gmt 0)

@brotherhood of LAN: This will simply turn off session-support for non-cookie-users completely. Also this will only work when you are actually using the sessioncode of PHP. A lot of software, especially those written for backwards-compatibility with PHP3 usually bring their own sessioncode. [php.net...]

@hyperion: Could it be that your ban is resulting from something else? As I have written I have multiple sites using that without problems with Google (and one site's PR went up to 6 this month).

brotherhood of LAN

WebmasterWorld Administrator brotherhood_of_lan us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 7239 posted 1:26 pm on Dec 1, 2002 (gmt 0)

This will simply turn off session-support for non-cookie-users completely.

I'm more at a loss each time with this. Here's a quote from the same php page you quote...I was reading it earlier

URL based session management has additional security risks compared to cookie based session management. Users may send an URL that contains an active session ID to their friends by email or users may save an URL that contains a session ID to their bookmarks and access your site with the same session ID always, for example.

So not all browsers accept cookies, and using URL's and having the sessionid in the URL poses a problem. It's making me wonder how it can be done....google aside :)

hyperion

10+ Year Member



 
Msg#: 7239 posted 8:32 pm on Dec 1, 2002 (gmt 0)

@ruserious,

no, it was the only change I made for six months, and there are no links to other sites, so bad neighbourhood cannot have been the cause, either.
And when I dumped the change, I got back into google with the next update. Maybe it works if you use the PHP-session handling functions, so that googlebot knows only sessid is missing, but because I use my own set of session functions, I had a get-paramter with a different name.
But I wouldn't try again ;-)...

nutsandbolts

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7239 posted 8:45 pm on Dec 1, 2002 (gmt 0)

Thanks for the replies so far - well, I think I will leave things as they are then.... I'm just not confident enough that it's a safe way to do things especially after my near-12 month absence from Google!

burt_online

10+ Year Member



 
Msg#: 7239 posted 11:42 pm on Dec 3, 2002 (gmt 0)

Hi. That script in no way "redirects" or "cloaks" the site. It simply does not produce a Session ID if one of the "spiders" in the array is visiting.

I am successfully using that exact script (as I am the "clever clogs" ;) that wrote it) across a number of Oscommerce sites that I admin.

Initial results:

270 products in the database of the main site I am tracking. Before adding the script *no* product pages were listed.

Since adding in the script, in the update over the past few days, there are now 257 products listed, all without SID...

If anyone can defaintely tell me that this is a harmful script, then I'll be glad to listen and make amends to it, but as of now, the proof is here. No products before the script was introduced, 257 of 270 listed after the script was added....

Would definately appreciate any comments! Thanks.

WebGuerrilla

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7239 posted 12:14 am on Dec 4, 2002 (gmt 0)

I did nearly the same thing on a site some month ago, and immediatly got the dreaded PR0 for it. (I really changed nothing else).
I agree it isn't cloaking, but googlebot will have some difficulty seeing the difference.

Googlebot doesn't "see" anything. It just retrieves links and stores data. If you were really only giving googlebot urls without a session id, there isn't any kind of automated way that googlebot can give you a penalty.

Loosing PR for a crawl cycle or two is fairly common. The fact that it happens doesn't mean you've been penalized.

No search engine has any legitimate reason to demand that sites that wish to be indexed must give up the right to track humans who use browsers with cookies disabled.

Any site that is serious about session tracking should setup a system that only excludes. Otherwise, you are giving up a significant amount of data.

Humans with cookies enabled get a cookie.

Humans with cookies disabled get a session id added to the url.

Spiders get neither.

IP/UA detction is the proper way to make that system work. And using such a system is no different than the geo targeting systems used by all search engines.

Onza

10+ Year Member



 
Msg#: 7239 posted 4:44 pm on Dec 4, 2002 (gmt 0)

I am just setting up a new Shop System, that uses a Session ID in the URL.

From this thread I have come to the conclusion, that using a similar script as provided by burt_online will be sufficient to enable spiders to index the product catalog.

Is this Script "universal" or will it only work for osCommerce? The Problem with ocCommerce is, that it doesn't feature any synchro options for ERP Software.

GoogleGuy

WebmasterWorld Senior Member googleguy us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 7239 posted 5:02 pm on Dec 4, 2002 (gmt 0)

Everybody knows I'm pretty anti-cloaking, but WebGuerilla has already made strong points why it's okay to drop a session ID for Google. burt_online, I'm really glad that your products got crawled a lot more--sounds like a win for your site (more pages indexed) and for Google (better coverage of useful pages), and that adds up to a better experience for searchers.

This is just my personal take, but allowing Googlebot to crawl without requiring session ID's should not run afoul of Google's policy against cloaking. I encourage webmasters to drop session ID's when they can. I would consider it safe. Fair enough?

Hope that helps,
GoogleGuy

ciml

WebmasterWorld Senior Member ciml us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 7239 posted 5:08 pm on Dec 4, 2002 (gmt 0)

Thanks for the clarification, GoogleGuy.

burt_online

10+ Year Member



 
Msg#: 7239 posted 7:23 pm on Dec 4, 2002 (gmt 0)

GoogleGuy: Many thanks for your thoughts, I appreciate it...

I'm 99.9% sure that a script that only remove the SID for some visitors (the bots) could not be construed as "something terrible"...however that 0.01% made me think twice over the past day or two...

Onza: The main part of the script is suitable for any use to determine if the visitor is a spider (or not) based upon the User Agent...all you'll need to do is to change the end bit to suit your circumstances...

This is a great forum! Thanks all.

rmjvol

10+ Year Member



 
Msg#: 7239 posted 12:47 am on Dec 5, 2002 (gmt 0)

Welcome to WebmasterWorld, burt_online. [webmasterworld.com]

I just did my 1st osC site. I forced use of cookies & completly did away w/the session id. We've got about 2000 pages showing with the "site:" command after 2 months. Almost all have been cached.

Just need to add a little more PR to get the last few crawled.

Had some encouraging early results afa referrals & rankings.

Good luck,
rmjvol

nutsandbolts

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7239 posted 9:02 am on Dec 5, 2002 (gmt 0)

Thanks GG for clearing that up... And thanks Burt for the script ;)

hetzeld

10+ Year Member



 
Msg#: 7239 posted 8:22 pm on Dec 5, 2002 (gmt 0)

Hi all,

It's my first post in this forum and I'd like to start with my 2 cents to improve the script ;)
1. Use $_SERVER["HTTP_USER_AGENT"] // if globals turned off
2. A break would help exiting earlier from a long spiders list, put the most important in the beginning

-------------------------

$spiders = array("Googlebot","WebCrawler, "etc etc");
$from_spider = FALSE;
foreach($spiders as $Val)
{
if (eregi($Val, $_SERVER["HTTP_USER_AGENT"]))
{
$from_spider=TRUE;
break;
}
}

// Session
if(!$from_spider)
session_start();
-----------------------

Thanks to all of you for all the valuable info!

andreasfriedrich

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 7239 posted 9:24 pm on Dec 5, 2002 (gmt 0)

I have been using a script like this for quite some time:

<?php 
/*
$Id: IsSearchEngine.inc.php,v 1.1.1.1 2000/06/07 22:41:53 af Exp $
$Name: $
*/
#
$br_array = array(%%NO_SESSION_UAS%%);
while (list(, $browser) = each($br_array)) {
if(preg_match("/$browser/i", $HTTP_USER_AGENT)) {
return true;
}
}
return false;
?>

The %%NO_SESSION_UAS%% field is filled in by my CMS.

It is called from header.php like this:

$sm = include('shared/IsSearchEngine.inc.php');
Andreas
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved