Forum Moderators: phranque
This is my first post on webmaster world. I was wondering if anyone may be able to help me with some Apache mod rewrite stuff.
I have written some rewrites to strip out pesky sessionid's (jsessionid) and timestamps (ts) from my URLs (for when a search engine is accessing one of my pages).
My rewrites look as follows:
RewriteEngine on
RewriteCond %{HTTPS_USER_AGENT} "Google" [NC,OR]
RewriteCond %{HTTPS_USER_AGENT} "Slurp" [NC,OR]
RewriteCond %{HTTPS_USER_AGENT} "MSNBOT" [NC]
RewriteRule ^(.*);jsessionid=[A-Za-z0-9]+(.*)$ $1$2 [R=301]
RewriteRule ^(.*)\?ts=[A-Za-z0-9]+(.*)$ $1$2 [R=301]
Unfortunatley they don't seem to be working as expected e.g.
For the following URL...
www.mysite.com/page-xyz;jsessionid=Y23XFD22HVSMCCSTHZOCFFI?ts=19660
What gets rewritten is...
www.mysite.com//page-xyz?ts=19660
So the sessionid gets stripped out but an extra "/" is added after the root and the timestamp is not stripped out at all.
Any ideas welcome as I'm really stuck!
Cheers,
Jamie
First, the session IDs should not be served to robots in the first place. Adding a redirect after the fact is not a good solution, as the 'true' URI has already been established by your on-page links, and search engines will continue to ask for URIs with SIDs forever. Fix the problem at the source (in your script), and do not assign session IDs to search engine requests.
Second, RewriteConds apply only to the single RewriteRule which they precede. Therefore, your second rule executes unconditionally, regardless of the requesting user-agent.
Third, query strings are not part of a URL, but rather, data attached to a URL. Therefore, RewriteRule cannot 'see' the query strings, and looks only at the URL-path. So your RewriteRule patterns and back-references will not work as expected. Query string parameters must be tested and back-references to them created by using a RewriteCond examining %{QUERY_STRING}
Note that a query string must be delimited by (start with) a question mark. Therefore, you first rule may or may not work, depending on whether the ";jsessionid" is preceded by a question mark. If no question mark is present, then that rule would behave as described by your test results.
Jim
Seems like I have a few issues here!
I'm not able to stop jsessions and timestamps being added due to technical resource restrictions (long story - i work for a big company and don't manage Apache work directly).
All of our sessionid's start with a ";" so I guess using Query String won't work for this.
I've had another stab after having gone through your points. I'm still a bit of a novice at the whole mod rewrite thing so I'd welcome any feedback.
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} "Google" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Slurp" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSNBOT" [NC]
ReWriteRule ^(.*);jsessionid=.*$ $1 [L,R=301]
RewriteCond %{QUERY_STRING} ts
RewriteCond %{HTTP_USER_AGENT} "Google" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Slurp" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSNBOT" [NC]
RewriteRule ^/(.*)$ /$1 [R=301,L]
RewriteEngine on
#
RewriteCond %{HTTP_USER_AGENT} Googlebot¦Slurp/¦msnbot [NC]
RewriteRule ^/(.*);jsessionid= http://www.example.com/$1 [R=301,L]
#
RewriteCond %{QUERY_STRING} &?ts=
RewriteCond %{HTTP_USER_AGENT} Googlebot¦Slurp/¦msnbot [NC]
RewriteRule ^/(.*)$ http://www.example.com/$[b]1?[/b] [R=301,L]
Replace the broken pipe "¦" characters with solid pipes before use; Posting on this forum modifies the pipe characters.
Jim
You called your code 'rewrites'. There are no rewrites. They are all redirects.
Always use the [L] flag on each rule, unless you know exactly why it should be omitted.
Redirects should also contain both the protocol and the domain name in the target.
Thanks for your last post - that's really helpful!
Just a few questions from my begginer's perspective...
Your code: RewriteRule ^/(.*);jsessionid= http://www.example.com/$1 [R=301,L]
Does the above line need to have a $ to denote the end of the URL being matched after "jsessionid=" e.g.
RewriteRule ^/(.*);jsessionid=$ http://www.example.com/$1 [R=301,L]
#
#
#
Your code: RewriteCond %{QUERY_STRING} &?ts=
What does the "&" do in the above line? In my URLs the timestamps are appended like this "www.mysite.com?ts=12345" - i.e. there is no "&" sign included.
#
#
#
Your code: RewriteRule ^/(.*)$ http://www.example.com/$1? [R=301,L]
Also, what is the significance of the "?" at the end of the above line?
Any feedback would be greatly appreciated!
Best Regards,
Jamie
2) The leading &? prevents a match if the variable name ends with "ts" but is not exactly "ts". For example, without that "soft anchor" you would get a match if the query string was "posts=123". Without the end-anchor, you've got a time-bomb waiting to cause future problems. It says literally, "If there is a character before 'ts' in this query string, it must be an ampersand."
3) As documented in the Apache mod_rewrite documentation, the "?" at the end of the substitution URL clears the query string (or more accurately, replaces it with a blank one). This question mark will not appear in the redirected URL -- it is a mod_rewrite operator, not a literal character. Without this operator, the query string would be passed through the rule unchanged -- That is, it would be re-appended to your new URL, and you'd end up with a rule that wouldn't accomplish anything except to create an 'infinite' redirection loop.
Jim