Forum Moderators: phranque

Message Too Old, No Replies

301 redirects or robots.txt for PHPSESSID?

How to redirect PHPSESSID urls.

         

bouncybunny

12:31 am on Mar 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi

I have a forum aplication that generates PHPSESSID type urls the first time the sitre is accessed and no amount of tinkering with the php.ini file or .htaccess has manged to stop this. So I have given up. No really, I have tried them all, it appears to be an issue with the software.

Google (and only Google - the others are fine) has indexed a lot of the PHPSESSID urls and returns them in its result over the 'real' urls. Which is a pain.

I was wondering if there was any way of 301 redirecting all the indexed PHPSESSID urls to their proper URLs? Obviously individually this would be a nightmare, but is there a global way to do this?

My first issue, I suppose would be how to avoid going into a loop as the 'home' page of the forum is also displayed with the PHPSESSID url the first time it is accessed.

Below is a sample url of a PHPSESSID url and the correct one I would like to redirect to. Any suggestions welcomed.

Thanks.

Topics are like this;

http://www.domain.com/forum/index.php/topic,5124.0.html?PHPSESSID=fc455887c9d4b7d7c753aa6d45ac55da

The real url should look like this;

http://www.domain.com/forum/index.php/topic,5124.0.html

And messages look like this;

http://www.domain.com/forum/index.php/topic,5178.msg9216.html?PHPSESSID=7622deb561a2b42d860af0a082843084

The real url should look like this;

http://www.domain.com/forum/index.php/topic,5178.msg5216.html#msg9216
(although Google doesn't index the part after the #)

My only other thought was using robots.txt to exclude PHPSESSID urls. But I am concerned that this might have a detrimental effect on the site getting spidered in the first place.

jdMorgan

2:16 am on Mar 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



bouncybunny,

Welcome to WebmasterWorld!

The correct fix is to modify the forum script to prevent Googlebot from being assigned a session ID. You might want to dig around in the FAQ or users forum for your particular forum software to see if this issue has already been addressed.

After that is done, you can 'recover' your search listings by using a bit of mod_rewrites code (shown here for .htaccess file application).


Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} Googlebot¦msnbot¦Slurp¦Ask\ Jeeves [NC]
RewriteCond %{QUERY_STRING} PHPSESSID=
RewriteRule (.*) http://www.example.com/$1? [R=301,L]

The case with the "#" in the URL cannot be handled easily with Apache, since "#" is reserved for use as a named anchor (client-side use only), and is invalid in a URL.

As you surmised, disallowing session-ID URLs in robots.txt would lead to those URLs being listed as URL-only, which would make your problem even worse.

Change the broken pipe "¦" characters above to solid pipes before use; Posting on this board modifies pipe characters.

Jim

bouncybunny

3:13 am on Mar 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi

Thanks for the welcome.

Unfortunately there is no current known way to prevent the forum software - SMF - from producing the PHPSESSIONID urls, trust me I have read about 50 threads on the subect on their forums and the developers appear to be disinterested about fixing it.

But I will try your 301 solution. Thanks for that.

I am thinking about going over to InVisionBoard or vBulletin for that very reason. But then there is the problem of converting all those urls.....

bouncybunny

3:41 am on Mar 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member


Hi

Can I just ask.

I removed the part refering to specific search engines so that it throws a 301 redirect to all users. I just thought it would be useful so that 'real' users could link to, and bookmark, the correct url if they decide to do so.

Will this achieve the same result as your code?

RewriteCond %{QUERY_STRING} PHPSESSID=
RewriteRule (.*) http://www.domain.com/$1? [R=301,L]

Thanks again. What a great forum. :)

bouncybunny

4:54 am on Mar 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually, I've just found out the answer to my own question. If I don't specifiy which user agent, then it can affect other functions of my forum, such as registration and logging in, which use dynamic urls.

Is there any way to have the code like this...


Options +FollowSymLinks
RewriteEngine on
RewriteCond %{QUERY_STRING} PHPSESSID=
RewriteRule (.*) http://www.example.com/$1? [R=301,L]

...yet exclude dynamic urls from these rules. For example if they have "action=" in the url, then they are not 301 redirected? This would keep the registration and login from being affected

Thanks for any advice.

[edited by: jdMorgan at 5:22 am (utc) on Mar. 28, 2006]
[edit reason] Formatting [/edit]

jdMorgan

5:21 am on Mar 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The code was intended to work as posted. If you leave out the USER_AGENT test, then it will have many side effects. Since I'm not familiar with your forum, I can't say what they would be.

However, to answer your question, you could make it redirect URLs containing *only* PHPSESSIDs by changing the RewriteCond to:


RewriteCond %{QUERY_STRING} ^PHPSESSID=[^&]*$

...can't recommend that, though.

Jim

bouncybunny

5:31 am on Mar 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member


Hi thanks.

My only concern about the original code is whether it could be seen as 'cloaking'. I.E. dealing with spiders in a different way to human visitors.

Or have I misunderstood this?

<--EDIT-->

The only side effect so far seems to be that the login and register urls are directed to the home page if clicked on whilst a PHPSESSID thread is showing. I.E. the first time the forum is accessed in a browser.

If I could somehow exclude urls like the following from the rewrite rules (i.e. dynamic urls), then it all seems to work fine.

http://www.domain.com/forum/index.php?PHPSESSID=17a3c2e4fab5fb8f6f767d4918a1206d;action=login

and

http://www.domain.com/forum/index.php?PHPSESSID=17a3c2e4fab5fb8f6f767d4918a1206d;action=register

jdMorgan

5:43 am on Mar 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, it's technically cloaking, but without intent to mislead. Cloaking done for technical reasons is not offensive to search engines. Take a look at the URL in your address bar right now -- That's not the 'real' URL...

Am I to understand that your name/value pairs are delimited by commas and not by ampersands? If so, then you'll need to switch that in the RewriteCond code I posted.

Jim

bouncybunny

5:58 am on Mar 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member


Thanks Jim

Sorry, I don't even know what a name/value pair is. I'm a bit of a copy and paste techie moron.

My forum content urls are already rewritten (using SMF forums gbuilt in SEF url option) so that they look something like this:

http://www.domain.com/forum/index.php/topic,1230.0.html

But the 'navigation' links ('login', 'register' and so on) are not rewritten and look something like this;

http://www.domain.com/forum/index.php?action=login

But when I pasted your code into .htaccess (minus the useragent line) it all seemed to work perfectly (aside from the dynamic urls, as described), so I'm assuming it was OK for my setup?

Or have I completely missed the point? Sorry, I have very little knowledge and am probably dangerous.

jdMorgan

1:44 pm on Mar 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




Options +FollowSymLinks
RewriteEngine on
RewriteCond %{QUERY_STRING} PHPSESSID=
RewriteCond %{QUERY_STRING} !action=.
RewriteRule (.*) http://www.example.com/$1? [R=301,L]

For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com].

Jim

bouncybunny

2:07 pm on Mar 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Jim

That's over and above the help and results that I expected. It is very much appreciated.

I still have an issue with the links to the actual boards, so I will make an attempt to try and learn some of this mod_rewrite stuff.

Scares the $#%^ out of me mind you.