Welcome to WebmasterWorld Guest from 54.197.94.30

Forum Moderators: goodroi

robots.txt syntex

How to block a directory.

   
12:48 am on Oct 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Below is apiece of my robots.txt file in the root directory of my site. Something must be wrong because pages from the /myfinancialhistory/ and /memberinfo/ (made up names for comic relief :-O ) are in the Google index.

User-agent: Black Hole
Disallow: /

User-agent: Titan
Disallow: /

User-agent: WebStripper
Disallow: /

And about 100 blocked robots........... then:

User-agent: *
Disallow: /myfinancialhistory/
Disallow: /memberinfo/

This is the site structure:

www.foobar.com/myfinancialhistory/bankaccounts.htm
www.foobar.com/memberinfo/criminalrecords.htm

I used the Google remove option a few days ago and they are still there. Is there something wrong with the robots.txt syntex? Should I move the User-agent: * stuff to the top?

Thanks

2:08 am on Oct 30, 2003 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



jdancing,

You probably want to keep that "User-agent: *" at the end -- Remember that good robots will obey the first record containing either a match on their user-agent name or "*" whichever comes first.

Check your file for extraneous characters - such as spaces at the end of lines, etc.

More info:
Learn: [robotstxt.org...]
Validate: [searchengineworld.com...]

Jim

4:11 am on Oct 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



so could a

User-agent: Googlebot-Image
Disallow: /

early on be and issue?

4:27 am on Oct 30, 2003 (gmt 0)

10+ Year Member



That entry shouldn't affect Googlebot because he and Googlebot-Image are two different bots.

If you cannot find a problem in your robots.txt, then I recommend you write to the company at googlebot@google.com . Send them a copy of the entries in your logs showing where he did not follow your robots.txt directives plus the URL of your site. They'll check out what happened with their bot.

4:28 am on Oct 30, 2003 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



jdancing,

An issue? What, with respect to 'regular' GoogleBot? No, that's not an issue.

But you do want your specific, per-robot stuff first, and then either allow or disallow the rest with the
"User-agent: *" record at the end.

Also, since you say you have about 100 bad-bot disallows, you might want to peruse this old thread: [webmasterworld.com...]

Jim

7:52 pm on Oct 31, 2003 (gmt 0)

10+ Year Member



Remember that good robots will obey the first record containing either a match on their user-agent name or "*" whichever comes first.

Do you have a source you can quote for that, Jim? The only time I remember where the order matters in a manner like that (general versus specific) in robots.txt is in the Disallow/Allow statements.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month