Welcome to WebmasterWorld Guest from 54.196.231.129

Forum Moderators: goodroi

Message Too Old, No Replies

Robot.txt file help

How to create this file

     

beasscr

5:25 pm on May 16, 2002 (gmt 0)

Inactive Member
Account Expired

 
 


Okay, I'm entering an arena in which I really know nothing about. So some hand holding would really be appreciated!;)

I've been asked to do get a Robot.txt file together since we had a document listed on Google/Yahoo that we didn't want listed. So in order to have it removed Google states that I have to first implment the coding <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">. I understand what this tag tells the spiders, but I don't know if I can simply just add this tag to the source code or if I have to actually create a Robot.txt file.

5:26 pm on May 16, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2001
posts:2059
votes: 0


First i would rename the file to robots.txt
and then enter www.robotstxt.org
5:35 pm on May 16, 2002 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 27, 2001
posts:12166
votes: 51


The robots.txt file is a set of instructions for visiting robots (spiders) that index the content of your web site pages. The file must reside in the root directory of your web. For those spiders that obey the file, it provides a map for what they can, and cannot index.

To exclude all robots from the server (do not use this one unless you want no indexing for the entire site!):

User-agent: *
Disallow: /

To exclude all robots from parts of a server:

User-agent: *
Disallow: /private/
Disallow: /images-saved/
Disallow: /images-working/

To exclude a single robot from the server:

User-agent: Named Bot
Disallow: /

To exclude a single robot from parts of a server:

User-agent: Named Bot
Disallow: /private/
Disallow: /images-saved/
Disallow: /images-working/

Note: The asterisk (*) or wildcard in the User-agent field is a special value meaning "any robot" and therefore is the only one needed until you fully understand how to set up different User-agents.

If you want to Disallow: a particular file within the directory, your Disallow: line might look like this one:

Disallow: /private/top-secret-stuff.htm

5:38 pm on May 16, 2002 (gmt 0)

New User

10+ Year Member

joined:May 1, 2002
posts:27
votes: 0


Just create a txt file called robots.txt as follows:

User-agent: *
Disallow: /cgi-bin
Disallow: /example.html

The asterisk in user-agent disallows all spiders.

The disallow can be a file or specific page as shown

beasscr

6:01 pm on May 16, 2002 (gmt 0)

Inactive Member
Account Expired

 
 


Thanks lazerzubb, I've checked the site out already and thats why I have more questions.

I'm really confused about how to get a certain page not to be indexed. If I create a robots.txt file with:

User-agent: *
Disallow: <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

Won't this tell the spiders not to index and follow the whole site?

Should I do something like this instead:
User-agent: *
Disallow: /thepagenotbeindexed

User-Agent: Googlebot
Disallow: /*.doc$

I know its elementry Webmaster stuff but when it comes to this stuff I'm in 1st grade in Webmaster school!

6:03 pm on May 16, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2001
posts:2059
votes: 0


User-agent: *
Disallow: /nottobeincluded.html

And then they wont include it.

6:30 pm on May 16, 2002 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:5810
votes: 64


To remove a page that you do not want Google to list, try submitting a request here:

[google.com...]

The META tag and the robot.txt are two seperate ways of doing about the same thing. This tag...

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

...is put anywhere between <HEAD> and </HEAD> on each page you don't want indexed.

The robots.txt file however, is a way of excluding robot(s) from crawling your entire site, directories, or individual pages. You can name specific robots to exclude or allow. It is more effecient because you can list all the info into one file.