Page is a not externally linkable
- Search Engines
-- Sitemaps, Meta Data, and robots.txt
---- Blocking domain specific links


Pfui - 3:46 am on Dec 21, 2005 (gmt 0)


Sorry but your example won't work because a robot won't understand it. Here are some things you can do instead:

1.) You can restrict or disallow robots from directories where you have pages you don't want crawled/spidered:

User-agent: *
Disallow: /example/

2.) Many robots will also let you exclude single pages:

User-agent: *
Disallow: /example/private.html

I say 'many' because not all robots, even the big ones, will follow all of your instructions (or even all of the time), and the bad ones will ignore your robots.txt file altogether.

3.) You can also include put HTML tags 'in' the pages you don't want crawled. But again, some robots will heed them, and others won't. (See "HTML Author's Guide" reference, below.)

You'll find loads of info about how to write your robots.txt file(s) here:

The Web Robots Pages
[robotstxt.org...]

And be sure to check out these two sections for specific info:

* Web Server Administrator's Guide to the Robots Exclusion Protocol
* HTML Author's Guide to the Robots Exclusion Protocol

When you're all set, upload your file and run it though SEW's:

Robots.txt Validator
[searchengineworld.com...]

Good luck!


Thread source:: http://www.webmasterworld.com/robots_txt/812.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com