Welcome to WebmasterWorld Guest from 54.205.60.49

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt Clarification

   
5:41 am on Nov 6, 2007 (gmt 0)

10+ Year Member



I know that I can Disallow a file, but can I Disallow a file URL that are coming with extra values such as

www.mydomain.com/index.php?cat_id=234

Can I make my robots.txt file with the following line to avoid the above type of URLs not to be index in search engines?

Disallow: /index.php?cat_id=
6:30 am on Nov 6, 2007 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Disallow is supposed to work for files and directories and you may specify a partial name.
however, there is no mention of support for query strings in the protocol, so i wouldn't count on anything there...
1:25 pm on Nov 6, 2007 (gmt 0)

5+ Year Member



As far as I know, and by what I've seen of bots' behavior, if you don't place a wildcard in the middle of the argument string, even msnbot will understand your syntax.
1:59 pm on Nov 6, 2007 (gmt 0)

10+ Year Member



I have done the following:

Disallow: /index.php?

Any suggestions and recommendations welcome.

10:44 am on Nov 13, 2007 (gmt 0)

5+ Year Member



Using? on index.php tells search engines to ignore all the files with arguments

You can try using wildcards if you want to disallow only the files with cat_id for example:

Disallow: /*cat_id=*

[edited by: LordLink at 10:45 am (utc) on Nov. 13, 2007]