Forum Moderators: open

Message Too Old, No Replies

Google error requesting robots.txt

         

keyplyr

11:16 pm on Apr 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Tried this problem at Tracking and Logging Forum, but no solutions.

This google bot is constantly in my error logs for requesting robots.txt incorrectly:

[Wed Apr 9 04:16:18 2003] [error] [client 64.68.82.5] File does not exist: /www/path/my_account/htdocs/index.htmlrobots.txt

What could be causing this?

pendanticist

11:20 pm on Apr 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>www/path/my_account/htdocs/index.htmlrobots.txt

Isn't there something erroneous there?

Pendanticist.

jdMorgan

11:32 pm on Apr 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



keyplr,

Have you tried using Brett's server header checker [webmasterworld.com] to request your robots.txt manually?
You might have a funky redirect, or it may just be a Googlebot error...

Jim

korkus2000

11:35 pm on Apr 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How long has it been happening? Is it just freshbot or deep bot too?

TXathlete

12:28 am on Apr 10, 2003 (gmt 0)



Could this be why I am seeing a site that has a clearly defined robots.txt file that limits access to multiple directories but Google robots are clearly indexing the off limits contents anyway?

Is anyone else seeing this problem? Maybe related?

keyplyr

12:29 am on Apr 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



pendanticist - that's what I'm referring to

Jim - checked out OK:

HTTP/1.1 200 OK
Date: Thu, 10 Apr 2003 00:26:12 GMT
Server: Apache/1.3.27 (Unix) FrontPage/5.0.2.2510 mod_ssl/2.8.12 OpenSSL/0.9.6g
Last-Modified: Mon, 07 Apr 2003 08:45:34 GMT
ETag: "682401-6b1-3e913aae"
Accept-Ranges: bytes
Content-Length: 1713
Connection: close
Content-Type: text/plain

Server response time:less than 1 second
[my_domain.com...]

korkus - I'm not familiar with all the google bot IPs, but I think it is just this one. Going on about a month.

speda1

1:34 am on Apr 10, 2003 (gmt 0)

10+ Year Member



What does your .htaccess file look like?

pendanticist

1:43 am on Apr 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>pendanticist - that's what I'm referring to

Are you saying the bot placed/replaced the robots.txt at the end like that?

www.blahblah.com/whatever-else-you-got-goin-on/robots.txt

or,

www.blahblah.com/whatever-else-you-got-goin-on/index.html

Both can't be there, that's for sure.

Is there anything particular about your host server we need to know?

Pendanticist.

killroy

1:59 am on Apr 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Looks like a url rewrite mess, since google only requests /robots.txt, i.e. form the root of the domain.

SN

keyplyr

2:37 am on Apr 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for the help. My htaccess is temporarily linked to my profile.

jdMorgan

2:49 am on Apr 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



keyplr,

I don't see anything in there that would cause this problem.
Your redirect for iaea.org may be a little funky, but it won't cause the problem you're having.

Jim

keyplyr

2:53 am on Apr 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Jim - but wasn't it you that gave me that funky iaea.org redirect? LOL

<edit> htaccess removed from profile </edit>

jdMorgan

4:52 am on Apr 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nah, I block iaea.org referrers too, but I block them from all pages using .*

You might want to see if you can get in touch with Google's crawl support. And please post when/if you get this resolved - it is very strange.

Jim

keyplyr

7:25 am on Apr 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just found that GTE bots are now also getting same error:

[Wed Apr 9 19:46:32 2003] [error] [client 4.61.195.48] File does not exist: /www/path/my_account/htdocs/index.htmlrobots.txt

So I will remove:

RewriteRule!^robots\.txt$ - [F]

...since I first noticed the problem immediately after I added this rule,
and replace it with:

RewriteRule ^.* - [F]

HitProf

8:57 am on Apr 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



keyplyr,

Have you updated pages recently? Perhaps it's a *link* they try to follow that you've put in somewhere accidentally.

edit: didn't see your last post :)

jdMorgan

3:50 pm on Apr 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



keyplr,

I seriously doubt that the Rule you modified above has anything to do with this, since it specifically allows access to robots.txt, even by banned user-agents. I've got exactly that same line in my .htaccess, and it hasn't caused any trouble for years. If nothing changes after you change that Rule, I'd suggest you change it back. Otherwise, you will have no way to test if an unknown suspicious user-agent will obey robots.txt.

Try Xenu Link Sleuth to test HitProf's idea - it's a good one.

Jim

keyplyr

6:11 pm on Apr 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have a distinct feeling the issue lies with a link from another webpage - most likely someone new to coding. I constantly find incorrectly written links in my error pages, and I do run a bi-monthly Xenu link check so I don't think it's a bad link on my end.

On a daily basis, dozens of MSOffice and FrontPage errors appear in my error logs due to desktop downloaders, as well as the occasional page hi-jacker not being able to access my relative links.

Since Google freshbot crawls a hundred or so of my pages 3 or 4 times a week, I'm guessing the issue isn't coming from my end, or at least it is not standing in the way of fresh crawls. My only concern would be if the bots weren't reading my directory disallows. Guess I'll find out after the Google update, whenever that happens.

Thanks for the help.