Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

pages blocked via robots.txt, but clogging up the index?

many pages blocked via robots.txt, are not spidered but clog up results?

         

mbucks

7:49 pm on Nov 4, 2006 (gmt 0)

10+ Year Member



Hi,
my site for 'blue widgets' has many subpages where the user is shown a certain type of blue widget, eg. depending on it's location. Using modrewrite, the URLs for these pages are eg:

/blue_widgets/new_york/
/blue_widgets/washington/

These are redirects to a single script which searches a DB for 'blue widgets' in the specific area, and shows the results. Each page has links to modify the actual search items - eg. broaden or narrow the location, or view different colour widgets.
As the search process is fairly lengthly, the search results are stored in the session and these nav links will point to eg:

/search/<search_id>/red_widgets

so that it does not need to perform a new search, for example if it is only narrowing the results down to a sub-location.
the /search/ directory is blocked via robots.txt, and is not spidered. However, I've noticed that these URLs have started to show up in site: results. This itself is not a problem, however I am worried that even if google just records each search URL it sees on every page, then it could have millions of these results in its index for the site.
Short of changing the system to not store the search results in a session (which would not be a practical solution), is there a better way this should be handled? Is it anything that I need to worry about?

Thanks for any advice.