homepage Welcome to WebmasterWorld Guest from 107.22.78.233
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robot.txt file help
How to create this file
beasscr




msg:1527121
 5:25 pm on May 16, 2002 (gmt 0)

Okay, I'm entering an arena in which I really know nothing about. So some hand holding would really be appreciated!;)

I've been asked to do get a Robot.txt file together since we had a document listed on Google/Yahoo that we didn't want listed. So in order to have it removed Google states that I have to first implment the coding <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">. I understand what this tag tells the spiders, but I don't know if I can simply just add this tag to the source code or if I have to actually create a Robot.txt file.

 

lazerzubb




msg:1527122
 5:26 pm on May 16, 2002 (gmt 0)

First i would rename the file to robots.txt
and then enter www.robotstxt.org

pageoneresults




msg:1527123
 5:35 pm on May 16, 2002 (gmt 0)

The robots.txt file is a set of instructions for visiting robots (spiders) that index the content of your web site pages. The file must reside in the root directory of your web. For those spiders that obey the file, it provides a map for what they can, and cannot index.

To exclude all robots from the server (do not use this one unless you want no indexing for the entire site!):

User-agent: *
Disallow: /

To exclude all robots from parts of a server:

User-agent: *
Disallow: /private/
Disallow: /images-saved/
Disallow: /images-working/

To exclude a single robot from the server:

User-agent: Named Bot
Disallow: /

To exclude a single robot from parts of a server:

User-agent: Named Bot
Disallow: /private/
Disallow: /images-saved/
Disallow: /images-working/

Note: The asterisk (*) or wildcard in the User-agent field is a special value meaning "any robot" and therefore is the only one needed until you fully understand how to set up different User-agents.

If you want to Disallow: a particular file within the directory, your Disallow: line might look like this one:

Disallow: /private/top-secret-stuff.htm

Bran




msg:1527124
 5:38 pm on May 16, 2002 (gmt 0)

Just create a txt file called robots.txt as follows:

User-agent: *
Disallow: /cgi-bin
Disallow: /example.html

The asterisk in user-agent disallows all spiders.

The disallow can be a file or specific page as shown

beasscr




msg:1527125
 6:01 pm on May 16, 2002 (gmt 0)

Thanks lazerzubb, I've checked the site out already and thats why I have more questions.

I'm really confused about how to get a certain page not to be indexed. If I create a robots.txt file with:

User-agent: *
Disallow: <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

Won't this tell the spiders not to index and follow the whole site?

Should I do something like this instead:
User-agent: *
Disallow: /thepagenotbeindexed

User-Agent: Googlebot
Disallow: /*.doc$

I know its elementry Webmaster stuff but when it comes to this stuff I'm in 1st grade in Webmaster school!

lazerzubb




msg:1527126
 6:03 pm on May 16, 2002 (gmt 0)

User-agent: *
Disallow: /nottobeincluded.html

And then they wont include it.

keyplyr




msg:1527127
 6:30 pm on May 16, 2002 (gmt 0)

To remove a page that you do not want Google to list, try submitting a request here:

[google.com...]

The META tag and the robot.txt are two seperate ways of doing about the same thing. This tag...

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

...is put anywhere between <HEAD> and </HEAD> on each page you don't want indexed.

The robots.txt file however, is a way of excluding robot(s) from crawling your entire site, directories, or individual pages. You can name specific robots to exclude or allow. It is more effecient because you can list all the info into one file.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved