Forum Moderators: phranque

Message Too Old, No Replies

do search engine "start over" each spider run?

or do they keep old broken urls forever?

         

NewSkool

2:49 pm on Apr 1, 2006 (gmt 0)



well what the topics say

NewSkool

3:34 pm on Apr 1, 2006 (gmt 0)



boy you really hate me

Conard

3:54 pm on Apr 1, 2006 (gmt 0)

10+ Year Member



I'll say that they keep old broken URL's for much longer than they should.

I've seen them drop some links for a while and then a year latter they pop back up again.

NewSkool

9:48 pm on Apr 1, 2006 (gmt 0)



why would they do this? wouldnt it be smarter to just refresh each time?

MichaelBluejay

7:35 pm on Apr 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The engines don't want to discard pages when a server might be down temporarily. But eventually dead pages get dropped. How often do you click on dead links in the SERPS? Not often, I bet.

pageoneresults

7:48 pm on Apr 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Broken URIs?

Spiders are very forgiving when it comes to receiving a 404 error. All that means is the page cannot be found.

If a document has been permanently removed, the server should probably return a 410 Gone status as opposed to a 404 Not Found.

There are specific issues to contend with when removing a page from the web. If there are inbound links to that page (which there most likely are), those links need to be removed completely if you want the pages to get removed from indices. If those links are being followed (which they most likely will be), they will continue to get a 404. The spider does not know that the page is Gone forever. It only knows that the page didn't exist on that visit. A 404 Not Found can be returned for a variety of reasons. Server being down, page moved, page removed, etc.

Here's the big problem. Many just don't know the intracacies (I didn't) of dealing with pages that have been moved and/or removed. It could take years to clear out a page that has been permanently removed if it is returning a 404 instead of a 410.

MichaelBluejay

9:56 pm on Apr 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, the engines may continue to follow a link that returns a 404 instead of a 410, but that doesn't mean it will *list it in the index*. If there's no data there, the SE's don't care much about that address.

moltar

10:57 pm on Apr 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yahoo kept non-existing URLs for over a year until it was indirectly brought to attention of one of the techies at Yahoo.

By inderectly I mean I was having problems with Yahoo Search API that I used on my website and one of the techies was helping me solve it. He saw the broken URLs being returned and fixed them, though he never admitted it :) But they were in the index for a year, and dissapeared the same moment he was helping me. Too much of a coincidence, don't you think?

MichaelBluejay

1:02 am on Apr 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Okay, but for all practical purposes, when a user does a search, they don't get a broken URL in the SERPS, unless the URL went dead very recently.