Forum Moderators: phranque
I got the problem with Google where links with the PHPSESSID parameter are indexed, and need to solve that. Additionally, my site uses "regions" which are looked after by the session, but initialized or changed via the URL at some point. The point is I don't want Google indexing either of these parameters, and maybe others in the future that I add.
So, what I want to do:
- Detect SE user agent, then
- 301 to the same URL requested but with ...
1) &PHPSESSID=asdf
2) ®ion=hjkl
... stripped from it.
Now I found some code in another thread that goes like this:
RewriteCond %{HTTP_USER_AGENT} "Google" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Slurp" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSNBOT" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "teoma" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "ia_archiver" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Scooter" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Mercator" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "FAST" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MantraAgent" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Lycos" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "ZyBorg" [NC]
RewriteCond %{QUERY_STRING} PHPSESSID
RewriteRule ^(.*)$ $1? [L,R=301]
Before I start mucking about, is this right?
Also, how do I also remove the "region" parameter - can I just add another line under the one that mentions PHPSESSID?
I don't know much about HTACCESS, so: help!
The exact answer to your question depends on whether the session ID and region parameters are the only ones attached to these URLs; If you have other query string parameters that must be passed through intact, then that changes the requirements dramatically.
Jim
My plan is to implement this HTACCESS code then wait for Google to re-index the pages, maybe I'll have to add the specific URLs already indexed to my sitemaps.xml.
As to the URLs, yes there are other parameters. A typical URL could look like this:
/main.php?section=sec&chapter=chap&page=1®ion=uk and I want Google to be 301'ed to
/main.php?section=sec&chapter=chap&page=1 etc.
I recommend that you view changing the script as the 'cure' and view the redirect as a 'fast search engine results cleanup' only. In fact, if the script is modified correctly, then the redirects are not even needed, except to speed up the process of purging the session/region URLs. Alone, the 301 approach is only a partial (and poor) fix.
Further, once the search engines no longer receive session/region parameters, the only reason you'd need to keep the redirects in place after the search engines remove the session/region URLs from their results would be if other sites continue to link to those session/region URLs.
Jim
Without turning off trans_sid, how do I do this then? The only solution I can think of is to use PHP to, ummn, 301 itself when it gets a SE. I mean the PHPSESSID stuff is handled by PHP, not my code, so I don't see how to do what you're suggesting.
Is there a way to use
php_value session.use_only_cookies 1
php_value session.use_trans_sid 0 in HTACCESS but only for SEs?
Any help on the HTACCESS code above which is still going to be necessary?
The solution to stripping the query string parameters using mod_rewrite varies from almost easy to extremely difficult, depending on the following:
1) Are both PHPSESSID= and region= parameters always present in the URLs to be redirected?
2) If present, are they always in the same order?
3) If present and in the same order, are they always contiguous?
4) If not contiguous, how many other parameters might be present between them?
Coding a one-size-fits-all mod_rewrite solution may or may not even be possible, depending on the answers to the above questions, since mod_rewrite is not nearly as flexible as a general scripting language.
Jim
1)
Change PHP site-wide to detect SEs then skip initiating sessions.
This will minimize 301s.
2)
Change PHP to detect SEs and modify internal links to not pass parameters.
This will further minimize 301s.
3)
Use HTACCESS to 301 from the requested URL to the same page less the parameter(s) for SEs.
Still necessary to "fix" currently indexed results and off-site inbound links containing the parameters.
I can do 1 and 2 easily, it's 3 that sees me here.
To answer the above questions concerning the parameters:
They're not always present, no, and could be either one or the other.
- If either one is present it is going to be at the very end of the URL
- and if both are present the PHPSESSID will be after 'region'. However there happens to be no case of this in Google at the present time.
So for whatever reason let's worry just about the first instance, where either PHPSESSID or 'region' is at the end of the URL. Should make it simple, eh?
The alternative is to header redirect (301) SEs from within the script, which again I can do without help, but that can't be done for PDFs etc.
Having the extra stuff at the end simplifies the problem considerably -- The code to handle "either order, anywhere" is really big, slow, and ugly. With region and SESSID (if present) always at the end, and always in that order, something like this should work:
RewriteCond %{HTTP_USER_AGENT} Googlebot¦Slurp¦MSNBOT¦teoma¦ia_archiver¦Mercator¦FAST¦MantraAgent¦Lycos [NC]
RewriteCond %{QUERY_STRING} ^((([^&]+&)*([^&]+)+)&)?region=[^&]+&PHPSESSID= [OR]
RewriteCond %{QUERY_STRING} ^((([^&]+&)*([^&]+)+)&)?region= [OR]
RewriteCond %{QUERY_STRING} ^((([^&]+&)*([^&]+)+)&)?PHPSESSID=
RewriteRule (.*) http://www.example.com/$1?%2 [R=301,L]
The first querystring RewriteCond with both parameters is not strictly necessary. You can take it out and experiment with the order of the other two querystring RewriteConds to achieve correct operation if you like. Just be sure that the last RewriteCond in the rule does not have an [OR] flag on it, while all but the first RewriteCond do have one.
Jim
It should work okay with other parameters in the URL (okay to be indexed) before either of those right?
Is it okay to post the Google search which shows the actual indexed URLs rather than working on hypothetical examples?