Forum Moderators: goodroi
User-agent: Black Hole
Disallow: /
User-agent: Titan
Disallow: /
User-agent: WebStripper
Disallow: /
And about 100 blocked robots........... then:
User-agent: *
Disallow: /myfinancialhistory/
Disallow: /memberinfo/
This is the site structure:
www.foobar.com/myfinancialhistory/bankaccounts.htm
www.foobar.com/memberinfo/criminalrecords.htm
I used the Google remove option a few days ago and they are still there. Is there something wrong with the robots.txt syntex? Should I move the User-agent: * stuff to the top?
Thanks
You probably want to keep that "User-agent: *" at the end -- Remember that good robots will obey the first record containing either a match on their user-agent name or "*" whichever comes first.
Check your file for extraneous characters - such as spaces at the end of lines, etc.
More info:
Learn: [robotstxt.org...]
Validate: [searchengineworld.com...]
Jim
If you cannot find a problem in your robots.txt, then I recommend you write to the company at googlebot@google.com . Send them a copy of the entries in your logs showing where he did not follow your robots.txt directives plus the URL of your site. They'll check out what happened with their bot.
An issue? What, with respect to 'regular' GoogleBot? No, that's not an issue.
But you do want your specific, per-robot stuff first, and then either allow or disallow the rest with the
"User-agent: *" record at the end.
Also, since you say you have about 100 bad-bot disallows, you might want to peruse this old thread: [webmasterworld.com...]
Jim
Remember that good robots will obey the first record containing either a match on their user-agent name or "*" whichever comes first.
Do you have a source you can quote for that, Jim? The only time I remember where the order matters in a manner like that (general versus specific) in robots.txt is in the Disallow/Allow statements.