Forum Moderators: open
This google bot is constantly in my error logs for requesting robots.txt incorrectly:
[Wed Apr 9 04:16:18 2003] [error] [client 64.68.82.5] File does not exist: /www/path/my_account/htdocs/index.htmlrobots.txt
What could be causing this?
Have you tried using Brett's server header checker [webmasterworld.com] to request your robots.txt manually?
You might have a funky redirect, or it may just be a Googlebot error...
Jim
Is anyone else seeing this problem? Maybe related?
Jim - checked out OK:
HTTP/1.1 200 OK
Date: Thu, 10 Apr 2003 00:26:12 GMT
Server: Apache/1.3.27 (Unix) FrontPage/5.0.2.2510 mod_ssl/2.8.12 OpenSSL/0.9.6g
Last-Modified: Mon, 07 Apr 2003 08:45:34 GMT
ETag: "682401-6b1-3e913aae"
Accept-Ranges: bytes
Content-Length: 1713
Connection: close
Content-Type: text/plainServer response time:less than 1 second
[my_domain.com...]korkus - I'm not familiar with all the google bot IPs, but I think it is just this one. Going on about a month.
Are you saying the bot placed/replaced the robots.txt at the end like that?
www.blahblah.com/whatever-else-you-got-goin-on/robots.txt
or,
www.blahblah.com/whatever-else-you-got-goin-on/index.html
Both can't be there, that's for sure.
Is there anything particular about your host server we need to know?
Pendanticist.
[Wed Apr 9 19:46:32 2003] [error] [client 4.61.195.48] File does not exist: /www/path/my_account/htdocs/index.htmlrobots.txt
So I will remove:
RewriteRule!^robots\.txt$ - [F]
...since I first noticed the problem immediately after I added this rule,
and replace it with:
RewriteRule ^.* - [F]
I seriously doubt that the Rule you modified above has anything to do with this, since it specifically allows access to robots.txt, even by banned user-agents. I've got exactly that same line in my .htaccess, and it hasn't caused any trouble for years. If nothing changes after you change that Rule, I'd suggest you change it back. Otherwise, you will have no way to test if an unknown suspicious user-agent will obey robots.txt.
Try Xenu Link Sleuth to test HitProf's idea - it's a good one.
Jim
I have a distinct feeling the issue lies with a link from another webpage - most likely someone new to coding. I constantly find incorrectly written links in my error pages, and I do run a bi-monthly Xenu link check so I don't think it's a bad link on my end.
On a daily basis, dozens of MSOffice and FrontPage errors appear in my error logs due to desktop downloaders, as well as the occasional page hi-jacker not being able to access my relative links.
Since Google freshbot crawls a hundred or so of my pages 3 or 4 times a week, I'm guessing the issue isn't coming from my end, or at least it is not standing in the way of fresh crawls. My only concern would be if the bots weren't reading my directory disallows. Guess I'll find out after the Google update, whenever that happens.
Thanks for the help.