Forum Moderators: goodroi
[webmasterworld.com...]
I am trying to follow the example to stop Google (and Ask Jeeves as it has become apparent) from indexing certain pages on my site and I have adapted this given example, written for someone else's specific situation:
User-agent: Googlebot
User-agent: Ask Jeeves/Teoma
Disallow: /cgi-bin/
Disallow: /robot.html
User-agent: *
Disallow: /cgi-bin/
Disallow: /wiget1.html
Disallow: /wiget2.html
Disallow: /robot.html
Using the above example I have created my disallow list and got this far:
User-agent: Googlebot
User-agent: Ask Jeeves/Teoma
User-agent: *
Disallow: /my_page_1.htm
Disallow: /my_page_2.htm
Disallow: /my_page_3.htm
Disallow: /my_page_4.htm
I removed the Disallow from the Google/Ask Jeeves section because I thought it was specific to the other person's site. When I test my robots.txt file it brings up an error saying that I need to give an instruction under the Google and Ask Jeeves section.
What do I need to put in the Google and Ask Jeeves section to get their robots to proceed as normal to my pages to find the <meta name="robots" content="noindex"> tags that I will put there?
I'm confused because in the example given above it only had Disallow comments for cgi-bin and robot.html and no other Allow instruction. Due to the absence of a specific allow instruction I thought I could remove those apparently-irrelevant-to-me Disallow comments and it would work properly by default!
Any advice would be much appreciated!
Many thanks,
Mike
[edited by: Durnovaria at 3:05 pm (utc) on Dec. 31, 2008]
in your example, the blank line after the User-agent: list stops that set of exclusions and the wildcard User-agent specification applies to all robots.
I was trying to follow the other example given and modify it for my needs.
The bit at the top with Google and Ask Jeeves in it was a separate section just for their robots. What it was supposed to do was direct Google and Ask Jeeves robots to go to the pages where they would find the "noindex" meta tag, due to the way they apparently log or index pages.
The remaining part of the file was for all other robots who apparently wouldn't have a problem excluding the pages in the list.
So what I think I need is an instruction under the Googlebot and Ask Jeeves section (but before the User-agent: * section) to send Google and Ask Jeeves to my pages. Apparently if I don't do that they will just use the normal Disallow list and still log the page URLs.
Mike
Allowed by line 3: Disallow:
Detected as a directory; specific files may have different restrictions
Also, out of curiosity I tested it on this site as well: [searchenginepromotionhelp.com...]
On that site it said:
Line/Contents
1/User-agent: Googlebot
The line below must be an allow, disallow or comment statement
2/User-agent: Ask Jeeves/Teoma
3/Disallow:
Missing / at start of file or folder name
So it appears not to like Line 1, saying that there should be a comment after it, and doesn't like Line 2, saying that there's a forward slash missing.
I've no idea what it's going on about, so I thought I would mention it!
Mike
:-)
User-agent: lines preceding the Disallow: statement(s). There must be one or more
Disallow: statements after the User-agent: line(s). There must be a blank line after the last
Disallow: statement of each block (i.e. before the next User-agent: line). If there is a specific section for Google then it reads only that section of the file. That is, it does NOT read the
User-agent: * section at all. This is the correct syntax if everything is allowed:
Disallow: Again, out of curiostity I ran that latest one through the checking program and it didn't like that either!
For lines 2 and 5 it said 'Missing / at start of file or folder name' and for line 11 (the final line) it said 'The line below must be an allow, disallow, comment or a blank line statement.'
I don't think I'll be using that checking program again!
Mike :-)