Forum Moderators: open
Inktomi, any comment on why this is happening? or does the spider not understand a 404? :)
Though just so I understand, could this be caused by the web host, or is it stricly a spidering issue?
Excerpt from my log file:
66.196.65.12 - - [11/Sep/2002:06:52:44 -0500] "GET /robots.txt HTTP/1.0" 404 133 "-" "Mozilla/5.0 (Slurp/si; slurp@inktomi.com; [inktomi.com...]
66.196.65.14 - - [11/Sep/2002:07:03:22 -0500] "GET /robots.txt HTTP/1.0" 404 133 "-" "Mozilla/5.0 (Slurp/si; slurp@inktomi.com; [inktomi.com...]
66.196.65.11 - - [11/Sep/2002:07:05:06 -0500] "GET /robots.txt HTTP/1.0" 404 133 "-" "Mozilla/5.0 (Slurp/si; slurp@inktomi.com; [inktomi.com...]
66.196.65.28 - - [11/Sep/2002:07:07:56 -0500] "GET /robots.txt HTTP/1.0" 404 133 "-" "Mozilla/5.0 (Slurp/si; slurp@inktomi.com; [inktomi.com...]
66.196.65.24 - - [11/Sep/2002:07:10:28 -0500] "GET /robots.txt HTTP/1.0" 404 133 "-" "Mozilla/5.0 (Slurp/si; slurp@inktomi.com; [inktomi.com...]
66.196.65.22 - - [11/Sep/2002:07:11:01 -0500] "GET /robots.txt HTTP/1.0" 404 133 "-" "Mozilla/5.0 (Slurp/si; slurp@inktomi.com; [inktomi.com...]
66.196.65.13 - - [11/Sep/2002:07:13:51 -0500] "GET /robots.txt HTTP/1.0" 404 133 "-" "Mozilla/5.0 (Slurp/si; slurp@inktomi.com; [inktomi.com...]
66.196.65.23 - - [11/Sep/2002:07:16:19 -0500] "GET /robots.txt HTTP/1.0" 404 133 "-" "Mozilla/5.0 (Slurp/si; slurp@inktomi.com; [inktomi.com...]
66.196.65.21 - - [11/Sep/2002:07:18:07 -0500] "GET /robots.txt HTTP/1.0" 404 133 "-" "Mozilla/5.0 (Slurp/si; slurp@inktomi.com; [inktomi.com...]
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>302 Found</TITLE>
</HEAD><BODY>
<H1>Found</H1>
The document has moved <A HREF="http://www.genesis(.*?).com/www.domain-in-your-profile.com/robots.txt">here</A>.<P>
<HR>
<ADDRESS>Apache/1.3.22 Server at www.genesis(.*?).com Port 80</ADDRESS>
</BODY></HTML>
I get the correct robots.txt when I use HTTP/1.1
I don't think Slurp understands 404s... It's been trying to request a file on one of my sites that has been gone for more than a year. My server responds with 404-Not Found. A few hours, days, or weeks later, Slurp comes back and tries again. While I appreciate that they don't drop files instantly, allowing for outages and occasional webmaster errors, I figure after a week, maybe they should drop it.
And before anyone pounces on me, there are no links to this page on the web anywhere that I have been able to find. It always was linked only from another deep page of my site, and has always had at least a <meta robots noindex> on it.
I've thought about trying a 410-Gone to break the logjam...
Jim
<edited for typo>