Forum Moderators: open
Can anyone help me with this?
BTW, my site used to be in 5 frames... yes 5! :S when I made it I never had a clue, not that I have a clue now.
Somewhere Slurp got it in his head that you had those pages. Where he learned it is irrelevant.
Now he visits /MyOldPage.html and gets a 404. He says, "Huh, no page here. I don't know where else to go. I'll leave."
If he gets a sitemap when he asks for /MyOldPage.html he'll at least know where else he can go. He ought to spider the site based on that.
I don't know what he'd do with a redirect. Someone else may have some experience with that.
He also has issues spidering cgi and other dynamic links as if they are duplicate content. I'm sure the Ink staff is working to fix this, but in the meantime, it sure is annoying.
Slurp also has some serious issues indexing old stuff and not getting rid of expired links/pages. I too have noticed this. I've got pages that have been gone for over a year still showing up in the serps. It's so frustrating! Ink's bot, In my opinion is the hardest to deal with and the most buggy... but that's just my opinion though... what do I know?
[edited by: kanetrain at 6:01 pm (utc) on Jan. 9, 2004]
what do you mean by "404 not found header"?
When a web page is sent, a bunch of stuff is sent before the HTML of the page - this is the document headers (completely different to the HTML head). One of the things that is sent is a status code - e.g if a document is found, the error code sent will be 200, meaning that the page was found by the web server. If page is not found, the server should return error code 404 so that the search engine / browser or whatever knows that the requested page could not be found. A common fault with custom 404 pages is to have them return a page found instead of a not found.
If you are using PHP then put this at the very top of the error page...
<?php
header("Status: 404 Not Found");
?>