Welcome to WebmasterWorld Guest from 54.196.244.186

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt Clarification

     
5:41 am on Nov 6, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 23, 2004
posts: 86
votes: 0


I know that I can Disallow a file, but can I Disallow a file URL that are coming with extra values such as

www.mydomain.com/index.php?cat_id=234

Can I make my robots.txt file with the following line to avoid the above type of URLs not to be index in search engines?

Disallow: /index.php?cat_id=
6:30 am on Nov 6, 2007 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10544
votes: 8


Disallow is supposed to work for files and directories and you may specify a partial name.
however, there is no mention of support for query strings in the protocol, so i wouldn't count on anything there...
1:25 pm on Nov 6, 2007 (gmt 0)

Full Member

5+ Year Member

joined:Dec 3, 2006
posts:257
votes: 0


As far as I know, and by what I've seen of bots' behavior, if you don't place a wildcard in the middle of the argument string, even msnbot will understand your syntax.
1:59 pm on Nov 6, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 23, 2004
posts: 86
votes: 0


I have done the following:

Disallow: /index.php?

Any suggestions and recommendations welcome.

10:44 am on Nov 13, 2007 (gmt 0)

New User

5+ Year Member

joined:Nov 13, 2007
posts:1
votes: 0


Using? on index.php tells search engines to ignore all the files with arguments

You can try using wildcards if you want to disallow only the files with cat_id for example:

Disallow: /*cat_id=*

[edited by: LordLink at 10:45 am (utc) on Nov. 13, 2007]