Will a redirect fix this spider problem?

Hi,

The E-Commerce system we use on a site, uses a session ID, and for every website visitor, this session ID is turned on. We do not want it to be turned on (sessions) for any spiders, crawlers, bots,etc, because then the session ID turns up in search engines, and we can have the instance of two people visiting the site, both with the same session ID. It causes huge problems.

So, we have the PHP code modified to look for the "agent" name (slurp, msnbot, googlebot,etc), and if the agent name is recognised as a spider,etc, we don't allow sessions to be used. This mod to the code works perfectly, however some search engines are still revisiting the site and using 'old' referenced URL's with the session ID's in the URL.

We have absolutely no control over what people or spiders send as the 'GET' though, hence the problem. Hoping that mod_rewrite will help this situation, we now have the (mod_rewrite) code to look for the "agent", and if it is either:

msnbot
slurp
googlebot
(there may be others?)

and they try and do this:

[example.com...]

the mod_rewrite will rewrite the url to be:

[example.com...]

Here is the mod_rewrite code:


RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^(msnbot¦slurp¦googlebot) [NC]
RewriteCond %{QUERY_STRING} ^(.*)\&?sessionID=[a-zA-Z0-9]+\&?(.*)$
RewriteRule ^(.*) $1?%1%2 [R=301,L]

which I am told will re-write the URL as shown above, and apparently cause a '301' (redirect).

Given the above example, if it does cause a 301 (we are about to start testing), will this stop the page from being indexed, or what will happen? What will spiders/bots like slurp and msnbot do if this happens?

Our objective is to try and get the spiders and bots from adding url's to their search engines which contain the session ID's. We do not want to affect the PR of the site, in our attempt to 'force' the spiders to re-index.

Session ID's cannot be in any links to the site, or in any search engine results. Will the 301 do the trick?

Thanks,

Peter

RewriteEngine on RewriteBase / # # Skip the next two rewriterules if NOT a spider RewriteCond %{HTTP_USER_AGENT} !(msnbot�slurp�googlebot) [NC] RewriteRule .* - [S=2] # # case: leading and trailing parameters RewriteCond %{QUERY_STRING} ^(.+)&sessionID=[0-9a-z]+&(.+)$ [NC] RewriteRule (.*) $1?%1&%2 [R=301,L] # # case: leading or trailing parameters only RewriteCond %{QUERY_STRING} ^(.+)&sessionID=[0-9a-z]$�^sessionID=[0-9a-z]+&(.+)$ [NC] RewriteRule (.*) $1?%1 [R=301,L] #

RewriteEngine on RewriteBase / # # Skip the next two rewriterules if NOT a spider RewriteCond %{HTTP_USER_AGENT} !(msnbot�slurp�googlebot) [NC] RewriteRule .* - [S=2] # # case: leading and trailing parameters RewriteCond %{QUERY_STRING} ^(.+)&sessionID=[0-9a-z]+&(.+)$ [NC] RewriteRule (.*) $1?%1&%2 [R=301,L] # # case: leading-only, trailing-only or no additional parameters RewriteCond %{QUERY_STRING} ^(.+)&sessionID=[0-9a-z]$�^sessionID=[0-9a-z]+&?(.*)$ [NC] RewriteRule (.*) $1?%1 [R=301,L]

RewriteEngine on RewriteBase / # # Check for the following spiders RewriteCond %{HTTP_USER_AGENT} (msnbot�slurp�googlebot) [NC] # case: leading and trailing parameters RewriteCond %{QUERY_STRING} ^(.+)&sessionID=[0-9a-z]+&(.+)$ [NC] RewriteRule (.*) $1?%1&%2 [R=301,L] # # Check for the following spiders RewriteCond %{HTTP_USER_AGENT} (msnbot�slurp�googlebot) [NC] # case: leading-only, trailing-only or no additional parameters RewriteCond %{QUERY_STRING} ^(.+)&sessionID=[0-9a-z]$�^sessionID=[0-9a-z]+&?(.*)$ [NC] RewriteRule (.*) $1?%1 [R=301,L]

# Set some options Options -Indexes Options FollowSymLinks

RewriteEngine on RewriteBase / # # Skip the next two rewriterules if NOT a spider RewriteCond %{HTTP_USER_AGENT}!(msnbot�slurp�googlebot) [NC] RewriteRule .* - [S=2] # # case: leading and trailing parameters RewriteCond %{QUERY_STRING} ^(.+)&sessionID=[0-9a-z]+&(.+)$ [NC] RewriteRule (.*) $1?%1&%2 [R=301,L] # # case: leading-only, trailing-only or no additional parameters RewriteCond %{QUERY_STRING} ^(.+)&sessionID=[0-9a-z]$�^sessionID=[0-9a-z]+&?(.*)$ [NC] RewriteRule (.*) $1?%1 [R=301,L]

Will a redirect fix this spider problem?

Getting rid of session ID's

jehoshua

jehoshua

jdMorgan

jdMorgan

jehoshua

jehoshua

jdMorgan

jehoshua

jehoshua

jdMorgan

valder

jehoshua

jehoshua

jdMorgan

valder

valder

jehoshua

jehoshua

jehoshua

Orange_XL

jehoshua

valder

Orange_XL

valder

jdMorgan

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week