Forum Moderators: goodroi
In a nutshell...
Make a new file in a text editor, place the following two lines in it, and save it as "robots.txt". Just these two lines, just like this:
User-agent: *
Disallow: /
That tells robots heeding robots.txt: "Keep Out"
Upload your "robots.txt" file as plain text and put it in your top level or root directory (/public_html or whatever).
I'm not quite sure what you mean by "stop search engine to index one of my subdomain files" but here are examples of completely disallowed directories:
User-agent: *
Disallow: /cgi-bin
Disallow: /includes
Disallow: /private
And here's an example of a disallowed file in a directory:
User-agent: *
Disallow: /messageboard/welcome.html
One reminder: A LOT of 'reputable' search engines do NOT heed robots.txt so if the file you don't want SEs to read is more private than not, robots.txt is not the best way to protect it from prying eyes.
That said, deflecting unwanted automatons initially depends on your server software (and what your ISP allows). For example, if your server runs Apache, there are things you can do (with .htaccess, with mod_rewrite, etc.) based on User-agent, Host name and/or IP address. Check Jim Morgan's superb help/how-tos in his Apache Web Server [webmasterworld.com] forum. The details can be tricky as heck, but extremely effective.
(I run Apache so if you're on a Windows box, I reckon there's a forum around here for that, too.)
As far as duplicate content goes, Googlebot heeds robots.txt, as does msnbot. That they do won't completely protect you against problems but might help you decide where to place your robots.txt disallows. For specific Google info, see the numerous forums in The Google World [webmasterworld.com] category.