homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

ban all except index.htm

10+ Year Member

Msg#: 547 posted 7:46 pm on Feb 9, 2005 (gmt 0)

Hi I was wondering if the best way to ban robots from
spidering an entire site except the index.html page would be by a
META NAME="robots" CONTENT="noindex, nofollow"
at the index.html page or if there is a better way.



10+ Year Member

Msg#: 547 posted 3:08 pm on Feb 12, 2005 (gmt 0)

That would tell the robots not to index the index.html page and not to follow any links from it. In theory, they wouldn't crawl anything via that page. However, if there are external links pointing to other pages within the site, that would only exclude the index.html page.

The correct way would be to exclude everything else in the robots.txt file and add the "noindex" and "nofollow" meta tags to all the other pages in the site (as a safety measure in case robots.txt is ignored)and just a "nofollow" tag to the index.html page.

Sample of robots.txt:

User-agent: *
Disallow: /images
Disallow: /cgi-bin/
Disallow: /each-and-every-directory
Disallow: /page-name.html
Disallow: /another-page-name.html
Disallow: /still-another-page-name.html

You would need to list each page in the root directory and each directory.

Robot Manager is an excellent tool to use if you have trouble writing the robots.txt file by hand (hope that's OK to list that resource). There is also an excellent validator here at SEW > [searchengineworld.com...]

Remember robots/crawlers/spiders have been known to ignore all these safeguards. If you have something sensitive you don't want to see in a SE, password protect the directory and/or page.

Hope that helps.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved