Hi I was wondering if the best way to ban robots from spidering an entire site except the index.html page would be by a META NAME="robots" CONTENT="noindex, nofollow" at the index.html page or if there is a better way.
That would tell the robots not to index the index.html page and not to follow any links from it. In theory, they wouldn't crawl anything via that page. However, if there are external links pointing to other pages within the site, that would only exclude the index.html page.
The correct way would be to exclude everything else in the robots.txt file and add the "noindex" and "nofollow" meta tags to all the other pages in the site (as a safety measure in case robots.txt is ignored)and just a "nofollow" tag to the index.html page.
You would need to list each page in the root directory and each directory.
Robot Manager is an excellent tool to use if you have trouble writing the robots.txt file by hand (hope that's OK to list that resource). There is also an excellent validator here at SEW > [searchengineworld.com...]
Remember robots/crawlers/spiders have been known to ignore all these safeguards. If you have something sensitive you don't want to see in a SE, password protect the directory and/or page.