Forum Moderators: goodroi
No Robots.txt at all on 25% of all hosts.
30% of the time, when a Robots.txt file was found, it contained nothing but HTML code. No Robots.txt syntaxed code at all.
Nearly 40% of those that had the file, and the file didn't contain HTML, were still so badly writen they couldn't be followed.
This is over a few years now, thousands of hosts. So please, make and check your Robots.txt file.
Are you referring to 'hosts' (ISPs) or webmasters?
The webmaster determines the contents of robots.txt and
the host simply stores whatever is sent up if I have that right.
Before I learned much of anything, I had a robots.txt
file that read something like
disallow:
flakes,weirdos, new-age types ..
I've since put up a better one thanks to the good advice on this forum.
- Larry
Disallow: Googlebot
-OR--
Disallow: *
Can we all agree that the asterisc (*) never belongs in this file other than in a "User-agent: *" line?
Also, I feel like I'm getting picky here, what with everything else webmasters have to worry about. But this file is just about all a good spider (and good spider writer) has to go by, other than the robot METAs.
Also, I feel like I'm getting picky here, what with everything else webmasters have to worry about
Yes you are being picky, you may understand how it works but a lot of others don't especially the newbies and maybe thats why you have seen so many mistakes made with the coding in this file.
You don't have to have the file and you can have just a blank robots.txt file, the reason for the blank file is to eliminate any 404's you would get in your log file when the file was requested but not available to be served
The robots.txt file only works on well behaved spiders other rouge spiders just walk all over it and don't obey so you have to implement other means to stop these.
It's not the lack of it I don't get, and I understand all kinds of simple mistakes..
But when my spiders get more links out of the HTML in a Robots.txt than the HTML of a page itself, I just remember how silly I first thought it was to include code to check Robots.txt for HREFs.
So, there is frustration on both ends, and the solution is not to complain, but to inform.
Standard for Robot Exclusion [robotstxt.org]
robots.txt syntax checker [searchengineworld.com]
Jim