I am going to convert a shopping cart done in php to static using mod_rewrite and was cautioned that to avoid duplicate content issues, I need to block all the php urls to avoid dup content issues in google. So does this work? If I add this into my robots.txt: User-agent: * Disallow: /*?
Will it block the search spiders from indexing all the old urls that currently look like this: [mydomain....] com/index.php?l=product_list&c=19
Does anyone have experience with this? Basically I am trying to avoid listing all these old urls in my robots txt file. Make it awfully large with 3000+.
You are talking about using wildcards aka pattern matching. Google, Yahoo and MSN support this extra function. I have used it and it works just fine with the big three search engines.
Please remember that it is not officially part of the robots.txt protocol so the smaller bots will probably not follow the wildcard rules.
sidenote - if you are using mod rewrite than you might not even need to use robots.txt wildcards. by 301 redirecting all requests for urls with "?" into static url versions you wouldn't need this rule. having the rule wouldn't hurt either.