Forum Moderators: open
If you want to be listed then the simplest way is to have no /robots.txt or robots META tag. If you want to disallow other bots in your /robots.txt then you should allow Googlebot there.
<META NAME="Robots" CONTENT="index,follow">
on your pages. This is more designed for either allowing indexing but excluding following links or vice versa i.e.
<META NAME="Robots" CONTENT="index,nofollow">
or
<META NAME="Robots" CONTENT="noindex,follow">
Hopefully search engine spiders will index and follow links without you having to tell them to do it!
Essentially robots.txt is used to exclude search engine spiders, so if you don't want to exclude any then there is no need for a robots.txt file.
Technically, this may be correct, however, I'm pretty sure that my host was delivering an error page in response to a request for robots.txt. Furthermore, so far as I could tell, that error page was delivered without a 404 http header. So the error page would have been treated as a robots.txt file.
I am inclined to believe that a blank robots.txt file is better than none at all.
Kaled.
If Google were to stop crawling sites where /robots.txt they would miss a lot of Web sites.
There were problems with some server erroneously returning 403 instead of 404 and Google used to treat that as "don't index". While that made sense, too many people had problems a 403 for /robots.txt doesn't prevent crawling any more.
So I created a new one that disallows all search engines to crawl a non-existant directory. Seems to work for me...
User-agent: *
Disallow: /DontLookHere/
Apparently some robots were treating the blank robots.txt file as Disallowing the entire site. Ack!