Welcome to WebmasterWorld Guest from 18.104.22.168
Forum Moderators: goodroi
For the others, can you move the script into one directory so you can disallow it? If not, I think you can stop them via rewrite rules feeding them say a 410 or something when they try to access the link with variables.
One thing you might want to consider is modifying your script to add the noindex, nofollow metatag on pages with variables in the URL. That is what I did on certain features, and the bots don't touch those pages except little Googlebot. So, I ended up using regex in the robots.txt to keep him away from those.
[edit corrected typos]
[edited by: BlueSky at 7:35 am (utc) on Oct. 3, 2003]
It looks like a greay area where nobody seems to have a definite answer - not even the robots specs cover this. From a look at Google's own robots.txt it seems that at least Google has a answer for this:
But www.google.com/mac [google.com] is indexed.
So i *guess* that index.php will get indexed but index.php?param=foo will not if index.php? is disallowed. I suppose you wouldn't even have to use a asterix. OTOH Google treats robots.txt not the same like other bots so i'm not sure how they would behave ...
I really need a answer to this because i want to avoid being crawled for dup content (rewritten url's + dynamic url's). Might be a good idea to run a test ...