mod_rewrite for removing session ids

OK, I've finally decided to print out the Apache mod_rewrite docs and do something about my employer's shopping cart URLs. Now, I decided not to bother doing an across the board switch from the current variable-loaded URLs to pseudo-directory style links, because most major SE robots (afaik) will follow cgi/variable links these days... it's the session ids that really screw them up.

I read all through the forums here, and found plenty of posts about the full pseudo-directory URL rewrites (ie: mapping url.com/script/1/2/3/4/5/6 to url.com/script.cgi?1=2&3=4&5=6), but nothing about how to just get rid of the session id when a robot comes knocking... after a bit of head scratching, here is what I came up with...

My URLs look like so:
domain.com/cgi-bin/shop/script.cgi?user_id=id&var_infinitum=val_infinitum

rewriteEngine on
rewriteBase /shop
rewriteCond %{HTTP_USER_AGENT} Googlebot.*
rewriteRule ^script\.cgi\?user_id=id&(.*)$ script\.cgi\?$1

If I am correct, that should just remove the user_id=id& out of the middle of the URL when Googlebot tries to follow the link, am I right? Then I can just add a new rewriteCond for each UA for whom I want the user_id variable removed.

Someone please let me know if there's a problem there (or tell me how to trick the server into thinking I'm Googlebot, so I can test it myself)... ;)

If I went the other way (modifying my link to remove the user_id variable, and then using mod_rewrite to reinsert it for everyone but the SE spiders), mod_rewrite would have to alter links for the majority of visitors (instead of only modifying them for the spiders), and have to parse the HTTP_REFERER to retrieve the session id to re-insert it for regular visitors, which seems like it would be a much larger drain on the server (and those who had referers turned off in their browser wouldn't be able to use the store).

I realize leaving all the variables in their ugly cgi form may not be ideal for spiders, but from what I've read, just getting rid of the session ids should at least allow those links to be crawled and indexed... Thoughts?

mod_rewrite for removing session ids

I think this should work...

mivox

DaveAtIFG

mivox

DaveAtIFG

jdMorgan

jdMorgan

mivox

jdMorgan

mivox

andreasfriedrich

jdMorgan

mivox

mivox

andreasfriedrich

mivox

andreasfriedrich

andreasfriedrich

mivox

mivox

andreasfriedrich

jdMorgan

mivox

andreasfriedrich

andreasfriedrich

mivox

mivox

jdMorgan

mivox

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week