Forum Moderators: open

Message Too Old, No Replies

Slurp taking trying to take pages that never existed

Silly Slurp !

         

Dayo_UK

8:05 am on Sep 20, 2003 (gmt 0)



Hope people dont mind me posting this - cant see that the extract from the logs is against the TOS as the page does not belong to my site.

Slurp tried to take the following page from my site this morning - this page however has never ever existed and slurp therefore got a 404 - slurp also attempted to fetch a few more pages that never existed - why would this happen?

66.196.73.82 - - [20/Sep/2003:05:09:30 +0200] "GET /fast_pay_day_loan/football.htm HTTP/1.0" 404 1985 www.my-site.co.uk "-" "Mozilla/5.0 (Slurp/cat; slurp@inktomi.com; http:*//www.inktomi.com/slurp.html)" "-"

jeremy goodrich

10:11 pm on Sep 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've this this happen for a number of reasons:

1) the domain belonged to somebody else, who previously *did* have a file / path of that name.

2) shared hosting - where you are sharing the same IP address of another bloke with a file / folder of that name. Host goofs up, and then the spider grabs your site / with somebody else's path.

3) Old domain hosted on your non shared IP address - One host I used long ago forgot to turn off the previously allocated domain name for the IP address that now was used for *my* domain. So, even though it wasn't a shared hosting environment, the spiders went ahead requesting /otherstuff.html which I never had.

For some reason, Slurp / Ask Jeeves, etc have a much longer memory for outdated files that they shouldn't request any more (at least, in my experience).

Best thing you can do is return a 410 'gone' header for the file and then serve up your error page.