Forum Moderators: open

Message Too Old, No Replies

Interesting visit by AskJeeves

Making a dead page cleanup pass?

         

jdMorgan

7:25 pm on Oct 10, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I had an interesting visit from AskJeeves REMOTE_HOST egspd404.directhit.com a couple of hours ago. It requested a number of dead pages, and no others. The UA was "Mozilla/2.0 (compatible; Ask Jeeves)" which, in itself, is unremarkable.

AskJeeves has frequently requested these dead pages over the past year, and seemed to pay no attention to the 404-Not Found server response returned each time, as it would return later and request them again. The name of this REMOTE_HOST and its behaviour lead me to believe that this is a cleanup pass, and that perhaps AskJeeves will no longer request these pages.

Has anyone else seen this kind of thing before?

Jim

Finder

7:05 am on Oct 11, 2002 (gmt 0)

10+ Year Member



It requested a number of dead pages, and no others.

Just noticed the same thing in my logs. Requested a bunch of pages that haven't been live since June, but nothing else.

Romeo

11:30 am on Oct 12, 2002 (gmt 0)

10+ Year Member



I saw him also (egspd406.directhit.com = 65.214.36.156), requesting dead and new pages. They started spidering without looking for my robots.txt. Was on 2002-10-10.

carfac

3:44 pm on Oct 12, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have noticed EXTREMELY RUDE behavior from Jeeves recently. Not reading/ignoring robots.txt. excessive hit rate in the middle of the day for an extended period.. and then he went into a loop requesting a non-existing page (froma non-existing directory) and requested that 5-10 times a second for 30 minutes

Until I caught it and cut it off.

Sent a e-mail to jeeves, no response.

dave

Romeo

7:17 pm on Oct 13, 2002 (gmt 0)

10+ Year Member



I have written them about ignoring my /robots.txt and got the following reply yesterday:
"We last checked [...mydomain...] for a robots.txt file on September 23 ...
We are working to increase the frequency with which we recheck the
robots.txt files."
So they don't check the /robots.txt at the begin of each crawling but once read a /robots.txt file and cache it for several days/weeks ... strange.