homepage Welcome to WebmasterWorld Guest from 54.196.57.4
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
robots.txt syntex
How to block a directory.
jdancing




msg:1527965
 12:48 am on Oct 30, 2003 (gmt 0)

Below is apiece of my robots.txt file in the root directory of my site. Something must be wrong because pages from the /myfinancialhistory/ and /memberinfo/ (made up names for comic relief :-O ) are in the Google index.

User-agent: Black Hole
Disallow: /

User-agent: Titan
Disallow: /

User-agent: WebStripper
Disallow: /

And about 100 blocked robots........... then:

User-agent: *
Disallow: /myfinancialhistory/
Disallow: /memberinfo/

This is the site structure:

www.foobar.com/myfinancialhistory/bankaccounts.htm
www.foobar.com/memberinfo/criminalrecords.htm

I used the Google remove option a few days ago and they are still there. Is there something wrong with the robots.txt syntex? Should I move the User-agent: * stuff to the top?

Thanks

 

jdMorgan




msg:1527966
 2:08 am on Oct 30, 2003 (gmt 0)

jdancing,

You probably want to keep that "User-agent: *" at the end -- Remember that good robots will obey the first record containing either a match on their user-agent name or "*" whichever comes first.

Check your file for extraneous characters - such as spaces at the end of lines, etc.

More info:
Learn: [robotstxt.org...]
Validate: [searchengineworld.com...]

Jim

jdancing




msg:1527967
 4:11 am on Oct 30, 2003 (gmt 0)

so could a

User-agent: Googlebot-Image
Disallow: /

early on be and issue?

BlueSky




msg:1527968
 4:27 am on Oct 30, 2003 (gmt 0)

That entry shouldn't affect Googlebot because he and Googlebot-Image are two different bots.

If you cannot find a problem in your robots.txt, then I recommend you write to the company at googlebot@google.com . Send them a copy of the entries in your logs showing where he did not follow your robots.txt directives plus the URL of your site. They'll check out what happened with their bot.

jdMorgan




msg:1527969
 4:28 am on Oct 30, 2003 (gmt 0)

jdancing,

An issue? What, with respect to 'regular' GoogleBot? No, that's not an issue.

But you do want your specific, per-robot stuff first, and then either allow or disallow the rest with the
"User-agent: *" record at the end.

Also, since you say you have about 100 bad-bot disallows, you might want to peruse this old thread: [webmasterworld.com...]

Jim

closed




msg:1527970
 7:52 pm on Oct 31, 2003 (gmt 0)

Remember that good robots will obey the first record containing either a match on their user-agent name or "*" whichever comes first.

Do you have a source you can quote for that, Jim? The only time I remember where the order matters in a manner like that (general versus specific) in robots.txt is in the Disallow/Allow statements.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved