Forum Moderators: goodroi
I've been asked to do get a Robot.txt file together since we had a document listed on Google/Yahoo that we didn't want listed. So in order to have it removed Google states that I have to first implment the coding <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">. I understand what this tag tells the spiders, but I don't know if I can simply just add this tag to the source code or if I have to actually create a Robot.txt file.
To exclude all robots from the server (do not use this one unless you want no indexing for the entire site!):
User-agent: *
Disallow: /
To exclude all robots from parts of a server:
User-agent: *
Disallow: /private/
Disallow: /images-saved/
Disallow: /images-working/
To exclude a single robot from the server:
User-agent: Named Bot
Disallow: /
To exclude a single robot from parts of a server:
User-agent: Named Bot
Disallow: /private/
Disallow: /images-saved/
Disallow: /images-working/
Note: The asterisk (*) or wildcard in the User-agent field is a special value meaning "any robot" and therefore is the only one needed until you fully understand how to set up different User-agents.
If you want to Disallow: a particular file within the directory, your Disallow: line might look like this one:
Disallow: /private/top-secret-stuff.htm
I'm really confused about how to get a certain page not to be indexed. If I create a robots.txt file with:
User-agent: *
Disallow: <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
Won't this tell the spiders not to index and follow the whole site?
Should I do something like this instead:
User-agent: *
Disallow: /thepagenotbeindexed
User-Agent: Googlebot
Disallow: /*.doc$
I know its elementry Webmaster stuff but when it comes to this stuff I'm in 1st grade in Webmaster school!
[google.com...]
The META tag and the robot.txt are two seperate ways of doing about the same thing. This tag...
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
...is put anywhere between <HEAD> and </HEAD> on each page you don't want indexed.
The robots.txt file however, is a way of excluding robot(s) from crawling your entire site, directories, or individual pages. You can name specific robots to exclude or allow. It is more effecient because you can list all the info into one file.