Forum Moderators: open
I don't want these pages to be indexed because I want people to only get to them once they completed the form. I realize if they know the name they can find them, but if I can keep them from being indexed as a page that would be immensely helpful.
Does anyone know how you tell the search engines to not index a particular page or file?
Fortune Hunter
Also, if you did want, for any reason, to prevent your clients from accessing the page again, you could store some of the variables from your php based form in a cookie, which is set to expire once the browser is closed. Then have your thankyou page check the cookie for the variable and if is not present then redirect to another page, else display the desired content.
However you can place this thank you page in a separate directory and block it out with robots.txt as well.
You'll want to be careful here when using robots.txt. If there is an indexible link to that thank you page anywhere on the site, in the html, Googlebot and other spiders will find it and possibly index it. In Googlebot's case, it will be indexed as a URI only, no title or description.
Is there any difference in effectiveness in using the first tag you gave or doing by search engine listing?
There shouldn't be. The first example I provided is based on the Robots META Tag Protocol and covers all bots that obey it. Both Google and MSN came out with their own so that you could disallow certain pages from specific search engines.
I do know that using the Robots META Tag to disallow content is probably a more effective way to keep those pages out of the index altogether. It also gives you more control over what you want to do with that page. You can use noindex, nofollow, or both (none).
Google provides these instructions to get sites or pages removed from their index:
[google.com...]
Similar information for Yahoo is here:
[help.yahoo.com...]
And for MSN:
[search.msn.com...]
The robots.txt file is an exclusion standard required by all web crawlers/robots to tell them what files and directories that you want them to stay OUT of on your site.
Many Web Robots offer facilities for Web site administrators and content providers to limit what the robot does. This is achieved through two mechanisms:
The Robots Exclusion Protocol
A Web site administrator can indicate which parts of the site should not be vistsed by a robot, by providing a specially formatted file on their site, in [......]
The Robots META tag
A Web author can indicate if a page may or may not be indexed, or analysed for links, through the use of a special HTML META tag.