Welcome to WebmasterWorld Guest from 188.8.131.52
Having multiple 404 errors could cause Google to stop indexing your site
and I would really like to know if anyone has any sort of documentation or authorative blogs regarding this subject as my site is currently returning about 10,000 404 pages because Google refuses to drop/stop crawling pages that have been removed from my site (i.e. now supplemented) over the past two years.
It would make sense for algos to subtract points for excessive link rot, so it's likely a safe assumption that if they don't now, sooner or later they will try.
Looking at things from another direction, broken links are a missed opportunity to reinforce your site's theme(s). They'll also alienate human visitors, some of whom might have linked to you... another missed opportunity.
If this conjecture is true, (and I'm not convinced, but I am suspicious) it would be because your pages have an excessive amount of links that are 404, not just because your server gets a lot of 404 requests. Otherwise competition could knock you out just by posting pages of bad links to your domain!
404 PAGE NOT FOUND (it may exist but we can't find it and we don't know why or we're just not telling - the cheque is in the mail, try again later)
410 GONE (page no longer exists - it used to exist but it has been deleted now, you should remove it from your cache)
If google knows a page exists (because it is in their cache) and they go to update their cache and get a 404 what should they do? remove it from the index? no because it may be a temporary glitch so they keep trying.
How long should they keep trying? days, months, years?
If the site is full of 404's and they have been trying for months to update the cache but keep getting 404's would this reflect badly on PR? I would think so. It would show a poorly maintained site by HTTP standards.
If you remove pages and want them gone then 410 is the status you should give it, not 404.
Absolutely, but plan on leaving the 410's in place for many many months. From what I've seen so far, the bots for G, Y and M have a bit of toruble with 410 and apparently have to see it (far too) many times before they will stop calling.
I hope this is true, because on our server I dont seem have the ability to create a custome 410 page, but I do have a custom 404 page that returns the correct header etc.
How does one create a custom 410 page that works correctly.
What if a customer mistypes the url for a certain page and gets the 410 instead of 404, will this make any difference in the scheme of things?
I agree. I don't think 404s should get you a penalty. I get 404s all the time where some scraper site only gets part of the URL. I'd rather return a 404 the point it somewhere else.
I'm not sure why folks are wondering about 410s - they are created the same way 404s are. I use them, but only reserve 410 for pages that existed at one time and were susequently removed.
404 means the page may or may not exist and this condition may or may not be permanent. file not found at this time for whatever reason, no other status code applies.
This is purely a cache control issue. Not about spam penalties.
Look at it this way.. when you are checking your outbound links (one of those rare occasions) and you find a 404 what do you do? Normally I would check it again the next day to see if it is back to 200 (maybe the server was temporarily down, maybe there was a glitch somewhere)
But if I find a 410 then I know right away that this is a permanent condition and right away I can deal with the outbound link. It's not a glitch the page is gone.
Now think of google with literally billions of pages cached. That cache has to be maintained and kept updated. The bot goes out to see if the page has been modified since the time it was last cached and if so it updates it.
It gets a 410 and knows right away the page has been removed and no longer exists. But if it gets a 404 response then what? Is the server having temporary problems? Is the connection good? was there a glitch? Does the page still exist? Is this a permanent or temporary condition?
It's not at all about the index and penalties but it is proper cache control. So google cache-control guys are looking at the cache and saying here are 200,000,000 404 pages sitting in our cache which we don't like because we want to return relevant results not 404 pages. So what do we do?
Well lets put the 404's into the supplemental index and if they come back good later then they can earn their way back into the regular index but if they persist for x-crawls or x-days then they will be treated as 410 (no longer exist).
Now in some other office another google exec is deciding how to assign PR. Ok what criteria do we use? Well some pages have minor spam penalties and are sent to the supplemental index so lets say one criteria is how many pages are in the supplemental index from this site. That's an indication of site quality since the lower quality pages end up there.
So there is a possible indirect connection between 404 and penalties. But it's a good SEO practice to help these giant indexes maintain a healthy index by returning proper status on pages. 404 is just a default (NO OTHER STATUS APPLIES) which is actually the only error status code. It is an error (or not).
I dont seem have the ability to create a custome 410 page, but I do have a custom 404 page that returns the correct header etc.
I also would be interested to know if you can create a custom 410 page, I'm going to try it now anyway but if anyone knows of some good docs on the subject I'd appreciate it.