Forum Moderators: phranque
I have a forum aplication that generates PHPSESSID type urls the first time the sitre is accessed and no amount of tinkering with the php.ini file or .htaccess has manged to stop this. So I have given up. No really, I have tried them all, it appears to be an issue with the software.
Google (and only Google - the others are fine) has indexed a lot of the PHPSESSID urls and returns them in its result over the 'real' urls. Which is a pain.
I was wondering if there was any way of 301 redirecting all the indexed PHPSESSID urls to their proper URLs? Obviously individually this would be a nightmare, but is there a global way to do this?
My first issue, I suppose would be how to avoid going into a loop as the 'home' page of the forum is also displayed with the PHPSESSID url the first time it is accessed.
Below is a sample url of a PHPSESSID url and the correct one I would like to redirect to. Any suggestions welcomed.
Thanks.
Topics are like this;
http://www.domain.com/forum/index.php/topic,5124.0.html?PHPSESSID=fc455887c9d4b7d7c753aa6d45ac55da The real url should look like this;
http://www.domain.com/forum/index.php/topic,5124.0.html And messages look like this;
http://www.domain.com/forum/index.php/topic,5178.msg9216.html?PHPSESSID=7622deb561a2b42d860af0a082843084 The real url should look like this;
http://www.domain.com/forum/index.php/topic,5178.msg5216.html#msg9216 (although Google doesn't index the part after the #) My only other thought was using robots.txt to exclude PHPSESSID urls. But I am concerned that this might have a detrimental effect on the site getting spidered in the first place.
Welcome to WebmasterWorld!
The correct fix is to modify the forum script to prevent Googlebot from being assigned a session ID. You might want to dig around in the FAQ or users forum for your particular forum software to see if this issue has already been addressed.
After that is done, you can 'recover' your search listings by using a bit of mod_rewrites code (shown here for .htaccess file application).
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} Googlebot¦msnbot¦Slurp¦Ask\ Jeeves [NC]
RewriteCond %{QUERY_STRING} PHPSESSID=
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
As you surmised, disallowing session-ID URLs in robots.txt would lead to those URLs being listed as URL-only, which would make your problem even worse.
Change the broken pipe "¦" characters above to solid pipes before use; Posting on this board modifies pipe characters.
Jim
Thanks for the welcome.
Unfortunately there is no current known way to prevent the forum software - SMF - from producing the PHPSESSIONID urls, trust me I have read about 50 threads on the subect on their forums and the developers appear to be disinterested about fixing it.
But I will try your 301 solution. Thanks for that.
I am thinking about going over to InVisionBoard or vBulletin for that very reason. But then there is the problem of converting all those urls.....
Can I just ask.
I removed the part refering to specific search engines so that it throws a 301 redirect to all users. I just thought it would be useful so that 'real' users could link to, and bookmark, the correct url if they decide to do so.
Will this achieve the same result as your code?
RewriteCond %{QUERY_STRING} PHPSESSID=
RewriteRule (.*) http://www.domain.com/$1? [R=301,L]
Thanks again. What a great forum. :)
Is there any way to have the code like this...
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{QUERY_STRING} PHPSESSID=
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
Thanks for any advice.
[edited by: jdMorgan at 5:22 am (utc) on Mar. 28, 2006]
[edit reason] Formatting [/edit]
However, to answer your question, you could make it redirect URLs containing *only* PHPSESSIDs by changing the RewriteCond to:
RewriteCond %{QUERY_STRING} ^PHPSESSID=[^&]*$
Jim
My only concern about the original code is whether it could be seen as 'cloaking'. I.E. dealing with spiders in a different way to human visitors.
Or have I misunderstood this?
<--EDIT-->
The only side effect so far seems to be that the login and register urls are directed to the home page if clicked on whilst a PHPSESSID thread is showing. I.E. the first time the forum is accessed in a browser.
If I could somehow exclude urls like the following from the rewrite rules (i.e. dynamic urls), then it all seems to work fine.
http://www.domain.com/forum/index.php?PHPSESSID=17a3c2e4fab5fb8f6f767d4918a1206d;action=login
and
http://www.domain.com/forum/index.php?PHPSESSID=17a3c2e4fab5fb8f6f767d4918a1206d;action=register
Am I to understand that your name/value pairs are delimited by commas and not by ampersands? If so, then you'll need to switch that in the RewriteCond code I posted.
Jim
Sorry, I don't even know what a name/value pair is. I'm a bit of a copy and paste techie moron.
My forum content urls are already rewritten (using SMF forums gbuilt in SEF url option) so that they look something like this:
http://www.domain.com/forum/index.php/topic,1230.0.html
But the 'navigation' links ('login', 'register' and so on) are not rewritten and look something like this;
http://www.domain.com/forum/index.php?action=login
But when I pasted your code into .htaccess (minus the useragent line) it all seemed to work perfectly (aside from the dynamic urls, as described), so I'm assuming it was OK for my setup?
Or have I completely missed the point? Sorry, I have very little knowledge and am probably dangerous.
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{QUERY_STRING} PHPSESSID=
RewriteCond %{QUERY_STRING} !action=.
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
Jim