Forum Moderators: phranque

Message Too Old, No Replies

Removing session id's with MOD REWRITE

Mod_rewrite returning directory path

         

kompreszor

11:55 pm on Oct 21, 2006 (gmt 0)

10+ Year Member



I'm having a little trouble with this and thought someone here might be able to help, but first a little background information.

I use phpws on our website with the Superhack to output friendly URL's and it works great. The trouble is that, as we all know, SE spyders will not acept cookies. So a lot of our pages are indexed with the session id attached to the URL in a querry string, this makes for a lot of duplicate pages.

I tried to fix this with the .htaccess file but kept getting 500 internal server errors, later I found out that I needed to use a php.ini file instead. So I fixed the problem by creating a php.ini file and inserting

php_flag session.use_trans_sid off;
in it. Now that I have that fixed I can work on removing the duplicate pages indexed by the SE's, this is where I'm having trouble.

The links that I need to fix are all in the form of: /calendar-event34.html?224f02268dbe6c05c35f51cc823cb7fd=b50a2a7865642036b2ed32085988a976
I've been doing a lot of research trying to fix this and found some code that I though would fix my problem. I modified it to just remove the a-z0-9=a-z0-9 from the url and figured this was the answer to my problems. But......

DirectoryIndex index.php
Options +FollowSymLinks
RewriteEngine On
#removing sid's from spyders
RewriteCond %{HTTP_USER_AGENT} "Google" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Slurp" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSNBOT" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "teoma" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "ia_archiver" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Scooter" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Mercator" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "FAST" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MantraAgent" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Lycos" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "ZyBorg" [NC]
RewriteCond %{QUERY_STRING} ([a-z0-9]+=[a-z0-9]+)
RewriteRule ^(.*)$ $1? [L,R=301]

the problem I'm having is that instead of writing:
[mywebsite.com...]
I'm getting:
[mywebsite.com...]
WOW! not exactly what I expected. I'm stumped, why would this code do this?

I'm not a webmaster and I've been doing a lot of reading but I'm having a hard time getting my head wraped around this stuff. I'm not sure if this will help or not but here is the .htaccess including the superhack..


DirectoryIndex index.php
Options +FollowSymLinks
RewriteEngine On
#removing sid's from spyders
RewriteCond %{HTTP_USER_AGENT} "Google" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Slurp" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSNBOT" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "teoma" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "ia_archiver" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Scooter" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Mercator" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "FAST" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MantraAgent" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Lycos" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "ZyBorg" [NC]
RewriteCond %{QUERY_STRING} ([a-z0-9]+=[a-z0-9]+)
RewriteRule ^(.*)$ $1? [L,R=301]
#rewrite rules for phpws friendly url's
#Standard URL (Must have a '~' in it)
RewriteRule ^([a-zA-Z0-9]*~.*)$ index.php?mod_rewrite=$1&%{QUERY_STRING} [NE]
#Module-specific URLs
RewriteRule ^article([1-9][0-9]*).html$ index.php?module=article&view=$1&%{QUERY_STRING}
RewriteRule ^news.html$ index.php?module=article&view=news&%{QUERY_STRING}
RewriteRule ^articlemenu.html$ index.php?module=article&disp=menu&%{QUERY_STRING}
RewriteRule ^announcement([1-9][0-9]*).html$ index.php?module=announce&ANN_user_op=view&ANN_id=$1&%{QUERY_STRING}
RewriteRule ^page([1-9][0-9]*).html$ index.php?module=pagemaster&PAGE_user_op=view_page&PAGE_id=$1&%{QUERY_STRING}
RewriteRule ^photoalbum.html$ index.php?module=photoalbum&PHPWS_AlbumManager_op=list&%{QUERY_STRING}
RewriteRule ^photoalbum([1-9][0-9]*).html$ index.php?module=photoalbum&PHPWS_AlbumManager_op=view&PHPWS_MAN_ITEMS[]=$1&%{QUERY_STRING}
RewriteRule ^calendar-event([1-9][0-9]*).html$ index.php?module=calendar&calendar[view]=event&id=$1&%{QUERY_STRING}
RewriteRule ^bbforum([1-9][0-9]*).html$ index.php?module=phpwsbb&PHPWSBB_MAN_OP=viewforum&PHPWS_MAN_ITEMS[]=$1&%{QUERY_STRING}
RewriteRule ^bbthread([1-9][0-9]*).html$ index.php?module=phpwsbb&PHPWSBB_MAN_OP=view&PHPWS_MAN_ITEMS[]=$1&%{QUERY_STRING}

jdMorgan

12:03 am on Oct 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The proper form for a mod_rewrite redirect to clear the query string is:

RewriteRule ^(.*)$ [b]http://www.example.com/[/b]$1? [R=301,L]

I'd also suggest adding the [L] flag to all of your rules unless you have a specific reason not to; The only reason to continue processing rules after a rewrite or redirect is if you need the output of the matched rule to be further processed by the following rules.

For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com].

JIm

kompreszor

2:17 am on Oct 22, 2006 (gmt 0)

10+ Year Member



Thank you for your help Jim, I'll read the links you supplied and see if I can figure it out.