Forum Moderators: phranque

Message Too Old, No Replies

mod rerwrite - removing sessionid

         

white rabbit

8:55 am on Jun 19, 2009 (gmt 0)

10+ Year Member



Hello folks!

This is my first post on webmaster world. I was wondering if anyone may be able to help me with some Apache mod rewrite stuff.

I have written some rewrites to strip out pesky sessionid's (jsessionid) and timestamps (ts) from my URLs (for when a search engine is accessing one of my pages).

My rewrites look as follows:

RewriteEngine on

RewriteCond %{HTTPS_USER_AGENT} "Google" [NC,OR]
RewriteCond %{HTTPS_USER_AGENT} "Slurp" [NC,OR]
RewriteCond %{HTTPS_USER_AGENT} "MSNBOT" [NC]

RewriteRule ^(.*);jsessionid=[A-Za-z0-9]+(.*)$ $1$2 [R=301]
RewriteRule ^(.*)\?ts=[A-Za-z0-9]+(.*)$ $1$2 [R=301]

Unfortunatley they don't seem to be working as expected e.g.

For the following URL...

www.mysite.com/page-xyz;jsessionid=Y23XFD22HVSMCCSTHZOCFFI?ts=19660

What gets rewritten is...

www.mysite.com//page-xyz?ts=19660

So the sessionid gets stripped out but an extra "/" is added after the root and the timestamp is not stripped out at all.

Any ideas welcome as I'm really stuck!

Cheers,

Jamie

jdMorgan

1:57 pm on Jun 19, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Problems at three levels:

First, the session IDs should not be served to robots in the first place. Adding a redirect after the fact is not a good solution, as the 'true' URI has already been established by your on-page links, and search engines will continue to ask for URIs with SIDs forever. Fix the problem at the source (in your script), and do not assign session IDs to search engine requests.

Second, RewriteConds apply only to the single RewriteRule which they precede. Therefore, your second rule executes unconditionally, regardless of the requesting user-agent.

Third, query strings are not part of a URL, but rather, data attached to a URL. Therefore, RewriteRule cannot 'see' the query strings, and looks only at the URL-path. So your RewriteRule patterns and back-references will not work as expected. Query string parameters must be tested and back-references to them created by using a RewriteCond examining %{QUERY_STRING}

Note that a query string must be delimited by (start with) a question mark. Therefore, you first rule may or may not work, depending on whether the ";jsessionid" is preceded by a question mark. If no question mark is present, then that rule would behave as described by your test results.

Jim

white rabbit

4:52 pm on Jun 19, 2009 (gmt 0)

10+ Year Member



Thanks for your post.

Seems like I have a few issues here!

I'm not able to stop jsessions and timestamps being added due to technical resource restrictions (long story - i work for a big company and don't manage Apache work directly).

All of our sessionid's start with a ";" so I guess using Query String won't work for this.

I've had another stab after having gone through your points. I'm still a bit of a novice at the whole mod rewrite thing so I'd welcome any feedback.

RewriteEngine on

RewriteCond %{HTTP_USER_AGENT} "Google" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Slurp" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSNBOT" [NC]

ReWriteRule ^(.*);jsessionid=.*$ $1 [L,R=301]

RewriteCond %{QUERY_STRING} ts
RewriteCond %{HTTP_USER_AGENT} "Google" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Slurp" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSNBOT" [NC]

RewriteRule ^/(.*)$ /$1 [R=301,L]

jkovar

5:13 pm on Jun 19, 2009 (gmt 0)

10+ Year Member



[google.com...]

jdMorgan

8:44 pm on Jun 19, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




RewriteEngine on
#
RewriteCond %{HTTP_USER_AGENT} Googlebot¦Slurp/¦msnbot [NC]
RewriteRule ^/(.*);jsessionid= http://www.example.com/$1 [R=301,L]
#
RewriteCond %{QUERY_STRING} &?ts=
RewriteCond %{HTTP_USER_AGENT} Googlebot¦Slurp/¦msnbot [NC]
RewriteRule ^/(.*)$ http://www.example.com/$[b]1?[/b] [R=301,L]

I assume that this code is going into your server config file. If not, then remove the leading slashes from the patterns in both RewriteRules.

Replace the broken pipe "¦" characters with solid pipes before use; Posting on this forum modifies the pipe characters.

Jim

g1smd

12:25 am on Jun 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



jd mentioned three problems, but the code above fixes several others too:

You called your code 'rewrites'. There are no rewrites. They are all redirects.

Always use the [L] flag on each rule, unless you know exactly why it should be omitted.

Redirects should also contain both the protocol and the domain name in the target.

white rabbit

8:57 am on Jun 23, 2009 (gmt 0)

10+ Year Member



Hi Jim,

Thanks for your last post - that's really helpful!

Just a few questions from my begginer's perspective...

Your code: RewriteRule ^/(.*);jsessionid= http://www.example.com/$1 [R=301,L]

Does the above line need to have a $ to denote the end of the URL being matched after "jsessionid=" e.g.

RewriteRule ^/(.*);jsessionid=$ http://www.example.com/$1 [R=301,L]
#
#
#
Your code: RewriteCond %{QUERY_STRING} &?ts=

What does the "&" do in the above line? In my URLs the timestamps are appended like this "www.mysite.com?ts=12345" - i.e. there is no "&" sign included.
#
#
#
Your code: RewriteRule ^/(.*)$ http://www.example.com/$1? [R=301,L]

Also, what is the significance of the "?" at the end of the above line?

Any feedback would be greatly appreciated!

Best Regards,

Jamie

jdMorgan

12:39 pm on Jun 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1) If you add an end anchor to the "jsessionid=" subpattern, then the rule will only execute if the jsessionid value is blank.

2) The leading &? prevents a match if the variable name ends with "ts" but is not exactly "ts". For example, without that "soft anchor" you would get a match if the query string was "posts=123". Without the end-anchor, you've got a time-bomb waiting to cause future problems. It says literally, "If there is a character before 'ts' in this query string, it must be an ampersand."

3) As documented in the Apache mod_rewrite documentation, the "?" at the end of the substitution URL clears the query string (or more accurately, replaces it with a blank one). This question mark will not appear in the redirected URL -- it is a mod_rewrite operator, not a literal character. Without this operator, the query string would be passed through the rule unchanged -- That is, it would be re-appended to your new URL, and you'd end up with a rule that wouldn't accomplish anything except to create an 'infinite' redirection loop.

Jim

white rabbit

10:53 am on Jun 24, 2009 (gmt 0)

10+ Year Member



Jim,

Thanks ever so much for your help - I have just implemented the updated Apache config and everything works perfectly!

Best Regards,

Jamie