Robot.txt file help

Forum Moderators: goodroi

Message Too Old, No Replies

Robot.txt file help

How to create this file

beasscr

5:25 pm on May 16, 2002 (gmt 0)

Okay, I'm entering an arena in which I really know nothing about. So some hand holding would really be appreciated!;)

I've been asked to do get a Robot.txt file together since we had a document listed on Google/Yahoo that we didn't want listed. So in order to have it removed Google states that I have to first implment the coding <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">. I understand what this tag tells the spiders, but I don't know if I can simply just add this tag to the source code or if I have to actually create a Robot.txt file.

lazerzubb

5:26 pm on May 16, 2002 (gmt 0)

First i would rename the file to robots.txt
and then enter www.robotstxt.org

pageoneresults

5:35 pm on May 16, 2002 (gmt 0)

The robots.txt file is a set of instructions for visiting robots (spiders) that index the content of your web site pages. The file must reside in the root directory of your web. For those spiders that obey the file, it provides a map for what they can, and cannot index.

To exclude all robots from the server (do not use this one unless you want no indexing for the entire site!):

User-agent: *
Disallow: /

To exclude all robots from parts of a server:

User-agent: *
Disallow: /private/
Disallow: /images-saved/
Disallow: /images-working/

To exclude a single robot from the server:

User-agent: Named Bot
Disallow: /

To exclude a single robot from parts of a server:

User-agent: Named Bot
Disallow: /private/
Disallow: /images-saved/
Disallow: /images-working/

Note: The asterisk (*) or wildcard in the User-agent field is a special value meaning "any robot" and therefore is the only one needed until you fully understand how to set up different User-agents.

If you want to Disallow: a particular file within the directory, your Disallow: line might look like this one:

Disallow: /private/top-secret-stuff.htm

Bran

5:38 pm on May 16, 2002 (gmt 0)

Just create a txt file called robots.txt as follows:

User-agent: *
Disallow: /cgi-bin
Disallow: /example.html

The asterisk in user-agent disallows all spiders.

The disallow can be a file or specific page as shown

beasscr

6:01 pm on May 16, 2002 (gmt 0)

Thanks lazerzubb, I've checked the site out already and thats why I have more questions.

I'm really confused about how to get a certain page not to be indexed. If I create a robots.txt file with:

User-agent: *
Disallow: <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

Won't this tell the spiders not to index and follow the whole site?

Should I do something like this instead:
User-agent: *
Disallow: /thepagenotbeindexed

User-Agent: Googlebot
Disallow: /*.doc$

I know its elementry Webmaster stuff but when it comes to this stuff I'm in 1st grade in Webmaster school!

lazerzubb

6:03 pm on May 16, 2002 (gmt 0)

User-agent: *
Disallow: /nottobeincluded.html

And then they wont include it.

keyplyr

6:30 pm on May 16, 2002 (gmt 0)

To remove a page that you do not want Google to list, try submitting a request here:

[google.com...]

The META tag and the robot.txt are two seperate ways of doing about the same thing. This tag...

...is put anywhere between <HEAD> and </HEAD> on each page you don't want indexed.

The robots.txt file however, is a way of excluding robot(s) from crawling your entire site, directories, or individual pages. You can name specific robots to exclude or allow. It is more effecient because you can list all the info into one file.