Forum Moderators: goodroi
User-agent: Googlebot
Disallow: /index.php?x=*$
For the others, can you move the script into one directory so you can disallow it? If not, I think you can stop them via rewrite rules feeding them say a 410 or something when they try to access the link with variables.
Disallow: /index.php?*$
One thing you might want to consider is modifying your script to add the noindex, nofollow metatag on pages with variables in the URL. That is what I did on certain features, and the bots don't touch those pages except little Googlebot. So, I ended up using regex in the robots.txt to keep him away from those.
[edit corrected typos]
[edited by: BlueSky at 7:35 am (utc) on Oct. 3, 2003]
It looks like a greay area where nobody seems to have a definite answer - not even the robots specs cover this. From a look at Google's own robots.txt it seems that at least Google has a answer for this:
Disallow: /mac?
But www.google.com/mac [google.com] is indexed.
So i *guess* that index.php will get indexed but index.php?param=foo will not if index.php? is disallowed. I suppose you wouldn't even have to use a asterix. OTOH Google treats robots.txt not the same like other bots so i'm not sure how they would behave ...
I really need a answer to this because i want to avoid being crawled for dup content (rewritten url's + dynamic url's). Might be a good idea to run a test ...