Msg#: 3316367 posted 1:11 am on Apr 20, 2007 (gmt 0)
Concerning the robots.txt file, this is by far the #1 error page in my stats, guess because the page doesn't exist. I haven't made any decision yet to block any bots so didn't create the page. Should I have this page nevertheless? If so, is this the correct code for not blocking any bots?
Msg#: 3316367 posted 1:32 am on Apr 20, 2007 (gmt 0)
Yes, you should have this standard resource on your site -- and for the precise reason you mention. Robots will request it, and if it's not there, their requests may pollute your access and error logs to the point of marginal usability. The error log file should have only real, unexpected errors in it, not be filled with errors that are easy to prevent.
These errors may also skew the results of your 'stats' program, if you use one.
favicon.ico, w3c/p3p.xml, and labels.rdf are three more standard resources you might consider providing.
The code you posted looks fine. Put a blank line after the "Disallow:" line for maximum compatibility (Every "record" in a robots.txt file should be followed by a blank line, and there was one (European?) 'bot a few years ago that insisted on its presence, even for the last record).
There have also been unconfirmed reports that having a robots.txt file increases the number of pages spidered by MSNbot on your site. So far, not enough data has been collected for me to conclude that this is true.
Msg#: 3316367 posted 5:52 am on Apr 20, 2007 (gmt 0)
Thanks for your advice. Yes, there have been a few favicon.ico errors too. Donít know why that is because favicon.ico should be linked on all my pages and resides in the root directory. Maybe I'm missing one or two. I'll check.
One other question if you will permit. A large number of errors (over 100 each in last two months) are requests for "mysite.com/index.htm/" and "mysite.com/defaultsite". Surely an error 404 page is served in these instances because the pages don't exist. But should I direct the bots not to look for them?