Welcome to WebmasterWorld Guest from 22.214.171.124 , register , free tools , login , search , pro membership , help , library , announcements , recent posts , open posts Pubcon Platinum Sponsor 2014
robots.txt syntex How to block a directory. jdancing
Below is apiece of my robots.txt file in the root directory of my site. Something must be wrong because pages from the /myfinancialhistory/ and /memberinfo/ (made up names for comic relief :-O ) are in the Google index.
User-agent: Black Hole
And about 100 blocked robots........... then:
Disallow: /myfinancialhistory/ Disallow: /memberinfo/
This is the site structure:
I used the Google remove option a few days ago and they are still there. Is there something wrong with the robots.txt syntex? Should I move the User-agent: * stuff to the top?
You probably want to keep that "User-agent: *" at the end -- Remember that good robots will obey the first record containing either a match on their user-agent name or "*"
whichever comes first.
Check your file for extraneous characters - such as spaces at the end of lines, etc.
Learn: [ ...] robotstxt.org Validate: [ ...] searchengineworld.com
so could a
early on be and issue?
That entry shouldn't affect Googlebot because he and Googlebot-Image are two different bots.
If you cannot find a problem in your robots.txt, then I recommend you write to the company at firstname.lastname@example.org . Send them a copy of the entries in your logs showing where he did not follow your robots.txt directives plus the URL of your site. They'll check out what happened with their bot.
An issue? What, with respect to 'regular' GoogleBot? No, that's not an issue.
But you do want your specific, per-robot stuff first, and then either allow or disallow the rest with the
"User-agent: *" record at the end.
Also, since you say you have about 100 bad-bot disallows, you might want to peruse this old thread: [
Remember that good robots will obey the first record containing either a match on their user-agent name or "*" whichever comes first.
Do you have a source you can quote for that, Jim? The only time I remember where the order matters in a manner like that (general versus specific) in
robots.txt is in the Disallow/Allow statements.