Welcome to WebmasterWorld Guest from 184.108.40.206 , register , free tools , login , search , subscribe , help , library , announcements , recent posts , open posts Pubcon Website
robots.txt syntex How to block a directory. jdancing msg:1527965 12:48 am on Oct 30, 2003 (gmt 0) Below is apiece of my robots.txt file in the root directory of my site. Something must be wrong because pages from the /myfinancialhistory/ and /memberinfo/ (made up names for comic relief :-O ) are in the Google index.
User-agent: Black Hole
And about 100 blocked robots........... then:
Disallow: /myfinancialhistory/ Disallow: /memberinfo/
This is the site structure:
I used the Google remove option a few days ago and they are still there. Is there something wrong with the robots.txt syntex? Should I move the User-agent: * stuff to the top?
jdMorgan msg:1527966 2:08 am on Oct 30, 2003 (gmt 0)
You probably want to keep that "User-agent: *" at the end -- Remember that good robots will obey the first record containing either a match on their user-agent name or "*"
whichever comes first.
Check your file for extraneous characters - such as spaces at the end of lines, etc.
Learn: [ ...] robotstxt.org Validate: [ ...] searchengineworld.com
jdancing msg:1527967 4:11 am on Oct 30, 2003 (gmt 0)
so could a
early on be and issue?
BlueSky msg:1527968 4:27 am on Oct 30, 2003 (gmt 0)
That entry shouldn't affect Googlebot because he and Googlebot-Image are two different bots.
If you cannot find a problem in your robots.txt, then I recommend you write to the company at email@example.com . Send them a copy of the entries in your logs showing where he did not follow your robots.txt directives plus the URL of your site. They'll check out what happened with their bot.
jdMorgan msg:1527969 4:28 am on Oct 30, 2003 (gmt 0)
An issue? What, with respect to 'regular' GoogleBot? No, that's not an issue.
But you do want your specific, per-robot stuff first, and then either allow or disallow the rest with the
"User-agent: *" record at the end.
Also, since you say you have about 100 bad-bot disallows, you might want to peruse this old thread: [
closed msg:1527970 7:52 pm on Oct 31, 2003 (gmt 0)
Remember that good robots will obey the first record containing either a match on their user-agent name or "*" whichever comes first.
Do you have a source you can quote for that, Jim? The only time I remember where the order matters in a manner like that (general versus specific) in
robots.txt is in the Disallow/Allow statements.