homepage Welcome to WebmasterWorld Guest from 54.161.192.130
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Why am I seeing duplicate content with session id?
nmjudy

10+ Year Member



 
Msg#: 3793076 posted 4:46 pm on Nov 24, 2008 (gmt 0)

Webmaster Tools is showing me pages with duplicate titles with URLS that look like this:

www.example.com/directory/
www.example.com/directory/index.html?PHPSESSID=d1df7a5b58659817c692854ed9c14ed6

My site is entirely static, with the exception of one directory that uses Postnuke. That section of the website has been there for several years and never caused a problem. From what I can see, Postnuke does not generate URL strings like this.

In the past, I've had a problem with paid membership websites framing my site for their content. All my pages have the "break out of frames" javascript code in them.

Could this PHPSESSID string be generated from another website? If so, could Google be indexing both versions of the page and dropping my pages for duplicate content? Obviously, Webmaster Tools sees the pages as 2 different pages with duplicate titles.

I'm one of the people whining over lost rankings from Nov 2 and still trying to figure out what caused it. In October, I implemented a sitewide change from pointing pages from /directory/index.html to /directory/ .

Is this just another buggy Webmaster Tools thing or can I do a workaround in mod rewrite to combine these kind of pages too?

 

Receptional Andy



 
Msg#: 3793076 posted 6:56 pm on Nov 24, 2008 (gmt 0)

It sounds like your pages might be inadvertently including a session ID. An easy way to see if that's likely to be happening is to browse the site with cookies disabled, and see whether session IDs are appended. If that's the case, then you can turn off that behaviour.

nmjudy

10+ Year Member



 
Msg#: 3793076 posted 7:57 pm on Nov 24, 2008 (gmt 0)

I'm 99.99% sure that these PHP session ids are not being generated by my site - so I'm just trying to figure out how to turn off the behavior.

Will inserting the following code in my .htaccess file, eliminate session ids from being used if they are being used in the link to me from another site?

php_value session.use_only_cookies 1
php_value session.use_trans_sid 0

If so, does it make a difference where exactly in my .htaccess file I put the code?

My current mod rewrite code for having all index.html pages '301'ing to the folder root is below. Is there a way to redirect any string after "index.html" to just redirect to the folder root (with the exception of anchor tags)?

# Redirect requests for index.html in any directory to "/" in the same directory
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.+/)?index\.html\ HTTP
RewriteRule ^(.+/)?index\.html$ http://www.example.com/$1 [R=301,L]
#

Receptional Andy



 
Msg#: 3793076 posted 8:28 pm on Nov 24, 2008 (gmt 0)

I'm 99.99% sure that these PHP session ids are not being generated by my site

IMO it's still worth checking see it's an easy test - just view the site with cookies disabled. Another way would be to view the Google cache of an indexed URL with a session ID in it, and then see whether the other links on the page also have a session ID appended.

It seems unlikely that another site has linked to your pages with session IDs appended, although anything's possible ;)

The htaccess code looks fine, and will work if your host supports it. It will not, however, remove URLs from Google's database.

Removing pages with session IDs is best achieved by (permanently) redirecting those requests tp the same page, but with the session parameter removed. You can see examples like this one [webmasterworld.com] over in the Apache forum. Note that even if you stop session IDs being generated, and redirect requests, such URLs can hang around for a long time, since Google is unlikely to request them very frequently - so doesn't discover your redirects very quickly either.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3793076 posted 8:37 pm on Nov 24, 2008 (gmt 0)

*** Is there a way to redirect ... ***

Yes.

# Redirect anything with a query string, force www, use same path, and remove all the query string parts.
RewriteCond %{QUERY_STRING} [b].[/b]
RewriteRule (.*) http://www.example.com/$1[b]?[/b] [R=301,L]

This redirect goes just before your non-www to www redirect.

On all of your other redirects add a question mark after the target URL to clear the query string on all of those redirects too. This is needed to avoid a redirection chain for certain requests.

Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3793076 posted 12:09 am on Nov 25, 2008 (gmt 0)

It's worth noting that redirecting anything with a query string may mess up your Adwords tracking, so be sure to set up exclusions for your landing pages, if you are using query strings to track your ppc campaigns.

Exclusions would be in the form...

RewriteCond $1 !^landingpage\.html$

They'd follow the query string rewrite condition above, and precede the rewrite rule. I'll leave it to someone else to give you the final code.

steveb

WebmasterWorld Senior Member steveb us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3793076 posted 12:16 am on Nov 25, 2008 (gmt 0)

If you don't have a redirect set up properly, all it takes to get duplicate content trouble like this is for someone to link to one of your URLs with a querystring attached, and then for google to screw up and list the query string URL instead of the correct one.

They usually get it right, but they sometimes make a mistake and list the wrong one.

nmjudy

10+ Year Member



 
Msg#: 3793076 posted 12:16 am on Nov 25, 2008 (gmt 0)

It looks like the session ids may be being generated from a 3rd party social bookmarking script. I was able to use link:www.example.com/directory/index.html?PHPSESSID=d1df7a5b58659817c692854ed9c14ed6 and found a couple of bookmarking scripts linking to my site through a user login screen.

I've copied and pasted how I interpreted what I'm supposed to change. I'm not sure if I was clear about the first set of redirects. Is this correct? Or do I need the query string code BEFORE the index redirect? Sorry...I'm "mod rewrite challenged".

Options +FollowSymlinks +Includes All -Indexes
RewriteEngine on
#
#
# Redirect requests for index.html in any directory to "/" in the same directory
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.+/)?index\.html\ HTTP
RewriteRule ^(.+/)?index\.html$ http://www.example.com/$1? [R=301,L]
#
RewriteCond %{QUERY_STRING} .
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
#
# Redirect requests for resources in non-www domains to same resources in www domain
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
RewriteRule (.*) http://www.example.com/$1? [R=301,L]

nmjudy

10+ Year Member



 
Msg#: 3793076 posted 12:20 am on Nov 25, 2008 (gmt 0)

Robert - thank you for the Adwords warning and workaround.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3793076 posted 12:42 am on Nov 25, 2008 (gmt 0)

The final code above should work, and the order looks to be correct, but I do think the [NC] should be deleted. That will then allow it to redirect for all hosts that are not exactly www.example.com all in lower case, i.e. it will then be able to redirect for upper case WWW.EXAMPLE.COM hostname.

There's a slightly more efficient way of redirecting for named index files (as in the first block of code). The new code also caters for appended port numbers, query strings, and some extraneous included unwanted trailing punctuation. Check recent posts in the Apache forum for details.

It would be a good idea to upgrade that first block of code, because as it stands now the above example will issue a double redirect for any URL request that includes both an index filename and any sort of query string data. The index redirect won't be invoked until the next listed query-string redirect has stripped the query string off. Chains should be avoided.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3793076 posted 5:34 pm on Nov 30, 2008 (gmt 0)

Coming late to this party, the following changes would seem to be warranted:

# Redirect requests for index.html in any directory to "/" in the same directory
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.+/)?index\.htm[b]l(\?[^\ ]*)?\ H[/b]TTP/
RewriteRule ^(.+/)?index\.html$ http://www.example.com/$1? [R=301,L]
#
RewriteCond %{QUERY_STRING} .
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
#
# Redirect requests for resources in non-www domains to same resources in www domain
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.c[b]om$[/b]
RewriteRule (.*) http://www.example.com/$1? [R=301,L]

Changes:
  • Modified RewriteCond pattern in first rule to match whether or not a query string is appended.
  • Modified second RewriteCond in last rule (non-www to www redirect) to remove the [NC] flag and end-anchor the pattern. The result is that the redirect will occur unless the hostname is *exactly* "www.example.com" or, in the case of HTTP/1.0 requests, blank.

    Note that that whole rule can be coded more efficiently in only two lines as:

    # Redirect requests for resources in non-www domains to same resources in www domain
    RewriteCond %{HTTP_HOST} ![b]^(w[/b]ww\.example\.co[b]m)?[/b]$
    RewriteRule (.*) http://www.example.com/$1? [R=301,L]

    Jim

  • Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Google / Google SEO News and Discussion
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
    WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
    © Webmaster World 1996-2014 all rights reserved