Forum Moderators: phranque

Message Too Old, No Replies

How Do I Rewrite PHPSESSID's?

Rewriting PHPSESSIDs using .htaccess

         

burcot

9:10 pm on Sep 13, 2006 (gmt 0)

10+ Year Member



Hi,

It's recently come to my attention that i have double the amount of pages i should have because of PHPSESSID's.

So i tried to rewrite them using .htaccess using this code:

rewriteBase /
rewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /page-name\.php?PHPSESSID=0480f2033699be254f5943900ec363d1231313
rewriteRule ^page-name\.php?PHPSESSID=0480f2033699be254f5943900ec363d1231313$ [website.com...] [R=301,L]

And it didn't work for some reason, could anyone here with better knowledge than me help me out?

Thanks.

matc

1:17 am on Sep 14, 2006 (gmt 0)

10+ Year Member



Hi,

Not sure I totally get what you wanted but could you try..

RewriteEngine on
RewriteBase /

RewriteCond %{REQUEST_FILENAME} ^page-name.php$
RewriteCond %{QUERY_STRING} PHPSESSID=([a-z0-9]+)
RewriteRule ^(.*)$ $1.php [R=301,L]

Very much untested but is that closer to what you were looking for?

cheers, matc

burcot

12:31 pm on Sep 14, 2006 (gmt 0)

10+ Year Member



Hi, thanks for your help but I still can't get it working. Let me try and explain a little better.

I would like to redirect the url example.com/widgets.php?PHPSESSID=1a2b3c4d5e6f7g8h9 to example.com/widgets.php

I am having trouble with the code required to remove the variable 'PHPSESSID' from the url when requested by the browser.

This rewrite is not designed to prevent my server from creating session id's as I have implemented code that handles this problem, but simply to redirect the pages cached by google with session id's to their original static url. In doing so, google will hopefully remove the urls that include the session id's from it's cache.

Any help would be much appreciated!

Cheers

matc

2:06 pm on Sep 14, 2006 (gmt 0)

10+ Year Member



Hi'ya,

Cool, so you want to remove the query string and redirect these styes of request to the state page without holding state on the session ID?

Is this any help?
[webmasterworld.com...]

Theres a note on R=301 from JdMorgan which should help with google's cache..

hope it helps.

matc

2:53 pm on Sep 14, 2006 (gmt 0)

10+ Year Member



Hi...

correction from first post from above..

RewriteCond %{REQUEST_FILENAME} ^page-name.php$
RewriteCond %{QUERY_STRING} PHPSESSID=([a-z0-9]+)
RewriteRule ^(.*)$ %1.php [R=301,L]

note the %1 instead of the $1 ... late night :-)

burcot

3:37 pm on Sep 14, 2006 (gmt 0)

10+ Year Member



Hey,

Tried your fix again with no luck, i also tried the link you posted but i couldn't find a rewrite cond on that thread only the rule.

Still no luck redirecting these Session IDS. I can't find anything that works on google either.

Thanks for the support

jdMorgan

8:55 pm on Sep 14, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd suggest the following corrections/simplifications for a simple test:

Options +FollowSymLinks
RewriteEngine on
RewriteBase /
#
RewriteCond %{QUERY_STRING} PHPSESSID=
RewriteRule ^page-name\.php$ /page-name.ph[b]p?[/b] [R=301,L]

This will only redirect the single page "page-name.php", and is intended only as an example for testing. If you wish to redirect some pages with session IDs, but not others, then you'll need to be more specific about that.

You may also consider whether you need to do this redirect for all visitors, or just for search engine robots. If you wish to do it for search engines only, then you need to add a RewriteCond testing %{HTTP_USER_AGENT} with a list of search engine user-agent strings (or partial strings) that you wish to redirect. As a simple example:


RewriteCond %{HTTP_USER_AGENT} (Googlebot¦Slurp¦msnbot¦Teoma) [NC]

Replace the broken pipe character with a solid pipe character before use; Posting on this forum modifies the pipe character.

Jim

burcot

6:08 pm on Sep 15, 2006 (gmt 0)

10+ Year Member



Thank you once again Jim. The code worked like a dream

XantosNew1

10:45 am on Sep 20, 2006 (gmt 0)

10+ Year Member



Hi to all,

I hav ethe same problem could you help me whats wrong
with the script?

I'm trying to redirect the pages with?phpsess...
in the url to the same url without?phpsess...

in order to resolve the problem with duplicate content.

here is the code:


RewriteCond %{QUERY_STRING} PHPSESSID=
RewriteRule ^page-name\.html$ /page-name.html? [R=301,L]

thanks in advance
XantosNew

XantosNew1

4:39 pm on Sep 21, 2006 (gmt 0)

10+ Year Member



Sorry for my bad explanantion above.
Here is it in more details.

I have one and the same page indexed by g**le several times
and thats for almost all of my pages.

for example:
=============================================================================================
[mydomain.com...]
[mydomain.com...]
[mydomain.com...]
.
.
[mydomain.com...]
.
.
[mydomain.com...]
[mydomain.com...]
[mydomain.com...]
.
.
.
I fixed the problem that was causing the adding of "?PHPSESSID=d45949c7w45969c7d48c53e74de472e3"
to the url-s so there should be no additional pages indexed with added
"?PHPSESSID=d45949c7w45969c7d48c53e74de472e3" to the url.

================================================================================================

all of these pages are PHP generated and after that Rewrite to static html whith this RewriteRule:

RewriteRule ([^-]*)-page-([0-9]*).html index.php?action=showpage&pageID=$2 [L]
=================================================================================================
as a result

I have for the same example page:
[mydomain.com...] additional indexed variants that looks like:
[mydomain.com...]
[mydomain.com...]
.
.
.

Now I want to redirect all:
[mydomain.com...]
[mydomain.com...]
.
.
[mydomain.com...]

and

[mydomain.com...]
[mydomain.com...]

to
[mydomain.com...]

and so on with all duplicated pages.
===================================================================================================

So I'm trying first to redirect:

all [mydomain.com...]
to [mydomain.com...]

with this script in .htaccess

RewriteCond %{QUERY_STRING} phpsess
RewriteRule ^exampleX-page-1\.html$ /exampleX-page-1.html? [R=301,L]

so with that csript in .htaccess when I hit:

[mydomain.com...]
it did not redirect me to
[mydomain.com...]

I'm making a mistake somewere and I could not find it.

I think that lots of people have the same problem with duplicated
content and they even do not know that there is such a problem.
Best Regards!
XantosNew

jdMorgan

8:18 pm on Sep 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The highlighted string in the RewriteCond pattern

RewriteCond %{QUERY_STRING} phpsess

must match the highlighted part of the requested URL

http://www.example.com/exampleX-page-1.html?PHPSESSID=0e773e74de4723a362a354s42c22ft55


And the highlighted pattern in the RewriteRule

RewriteRule ^exampleX-page-1\.html$ http://example.com/exampleX-page-1.html? [R=301,L]

must match this highlighted part of the URL

http://www.example.com/exampleX-page-1.html?PHPSESSID=0e773e74de4723a362a354s42c22ft55


So clearly, only the pattern in the rule is correct, and the RewriteCond pattern needs to be corrected.

Jim

lmo4103

8:28 pm on Sep 21, 2006 (gmt 0)

10+ Year Member



Silly question?

Are you all sure that the destination php program is going to actually get the phpsessid that it requires in order to properly do the session?

jdMorgan

9:23 pm on Sep 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good question. But once we get the basic redirect working, it's a simple matter to add a test to see if the "visitor" is a search engine robot, based on the IP address or on the User-agent in the HTTP header sent with the request.

Jim

lmo4103

9:44 pm on Sep 21, 2006 (gmt 0)

10+ Year Member



I saw the user-agent selective rewrite rule somewhere on this site, but lost it. Still awaiting answer regarding "will php maintain state?"

Maybe this alternative is more certain:
[webmasterworld.com...]

$spiders = array("Googlebot","WebCrawler, "etc etc");
$from_spider = FALSE;
foreach($spiders as $Val)
{
if (eregi($Val, $_SERVER["HTTP_USER_AGENT"]))
{
$from_spider=TRUE;
break;
}
}

// Session
if(!$from_spider)
session_start();


or something like that.

lmo4103

1:02 pm on Sep 22, 2006 (gmt 0)

10+ Year Member



I found here. (Ahem, simple matter?)


# Skip the next two rewriterules if NOT a spider
RewriteCond %{HTTP_USER_AGENT}!(msnbot¦slurp¦googlebot) [NC]
RewriteRule .* - [S=2]
#
# case: leading and trailing parameters
RewriteCond %{QUERY_STRING} ^(.+)&sessionID=[0-9a-z]+&(.+)$ [NC]

[webmasterworld.com...]

or is that baxward?

... hey, look up ... huh
[webmasterworld.com...]

Oh, there it was. Somebody put these together in the right order and that's probably it.

lmo4103

1:21 pm on Sep 23, 2006 (gmt 0)

10+ Year Member



RewriteEngine on
RewriteBase /
#
# Skip the next two rewriterules if NOT a spider
RewriteCond %{HTTP_USER_AGENT}!(msnbot¦slurp¦googlebot) [NC]
RewriteRule .* - [S=2]
#
# case: leading and trailing parameters
RewriteCond %{QUERY_STRING} ^(.+)&PHPSESSID=[0-9a-z]+&(.+)$ [NC]
RewriteRule (.*) $1?%1&%2 [R=301,L]
#
# case: leading-only, trailing-only or no additional parameters
RewriteCond %{QUERY_STRING} ^(.+)&PHPSESSID=[0-9a-z]+$¦^PHPSESSID=[0-9a-z]+&?(.*)$ [NC]
RewriteRule (.*) $1?%1 [R=301,L]
----------------------------------------------------------------
...(and don't forget)... Replace the broken pipe character with a solid pipe character before use

That did "a job" on it for me.
It did produce a 301 redirect w/o PHPSESSID.

I have something similar to the php method in msg #:3092199 above in service and goog'
indexes as desired (with regard to PHPSESSID). I'm too newbie at mod_rewrite
to boldly break whats already fixed ... yet.

Google will always do what you expect that it will do ... do ya think?

lmo4103

2:01 am on Sep 24, 2006 (gmt 0)

10+ Year Member



Doh..!

Allow search bots to crawl your sites without session IDs [google.com]

I might take the R=301 out!

PHPSESSID is generated on the fly.
The mod_rewrite takes the PHPSESSID right back out.
Google never needs to see or know about the PHPSESSID.

Some say google has gone foobar lately and doesn't require additional confusion.

lmo4103

11:46 am on Sep 25, 2006 (gmt 0)

10+ Year Member



mod rewrite or php to strip session id for search engines?

Generally, yes, it would be much better to handle suppressing crawler sessions within your script,
so that robots are not given session-laden URLs, rather than using mod_rewrite to 'fix-up' these URLs
with a 301 redirect after the fact. Prevention rather than cure [webmasterworld.com], in other words.

You might still want to keep the mod_rewrite code to fix up any straggler URLs
already indexed by search spiders

-- jdMorgan