homepage Welcome to WebmasterWorld Guest from 107.20.109.52
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Having too many 404's stops you being indexed?
internetheaven




msg:771031
 10:49 am on May 13, 2006 (gmt 0)

Saw this as a passing comment in an alternative thread:

Having multiple 404 errors could cause Google to stop indexing your site

and I would really like to know if anyone has any sort of documentation or authorative blogs regarding this subject as my site is currently returning about 10,000 404 pages because Google refuses to drop/stop crawling pages that have been removed from my site (i.e. now supplemented) over the past two years.

 

tedster




msg:771032
 4:21 pm on May 13, 2006 (gmt 0)

If this conjecture is true, (and I'm not convinced, but I am suspicious) it would be because your pages have an excessive amount of links that are 404, not just because your server gets a lot of 404 requests. Otherwise competition could knock you out just by posting pages of bad links to your domain!

buckworks




msg:771033
 4:54 pm on May 13, 2006 (gmt 0)

Anyone can have a dud link now and then, but too many broken links on a site would definitely undermine its "signals of quality".

It would make sense for algos to subtract points for excessive link rot, so it's likely a safe assumption that if they don't now, sooner or later they will try.

Looking at things from another direction, broken links are a missed opportunity to reinforce your site's theme(s). They'll also alienate human visitors, some of whom might have linked to you... another missed opportunity.

Asia_Expat




msg:771034
 6:54 pm on May 13, 2006 (gmt 0)

I'm producing loads of error pages right now because of an unavoidable directory change. Because PHP refers you straight back to the main index rather than give an error, I had tens of thousands of URL's pointing to index and never being dropped... very bad.
I decided the only solution was to use htaccess to configure a 404 response for those old non existant pages. If the above is true, I'm screwed.

Reid




msg:771035
 1:01 pm on May 14, 2006 (gmt 0)

Tedster wrote:
If this conjecture is true, (and I'm not convinced, but I am suspicious) it would be because your pages have an excessive amount of links that are 404, not just because your server gets a lot of 404 requests. Otherwise competition could knock you out just by posting pages of bad links to your domain!

I dont think that would work because the competition would only hurt themselves by posting pages of rotten links. the 404's would be attributed to the page that contains them, not the server serving them. However a page that WAS indexed by google and now returns 404 is a different thing.

404 PAGE NOT FOUND (it may exist but we can't find it and we don't know why or we're just not telling - the cheque is in the mail, try again later)
410 GONE (page no longer exists - it used to exist but it has been deleted now, you should remove it from your cache)

If google knows a page exists (because it is in their cache) and they go to update their cache and get a 404 what should they do? remove it from the index? no because it may be a temporary glitch so they keep trying.
How long should they keep trying? days, months, years?
If the site is full of 404's and they have been trying for months to update the cache but keep getting 404's would this reflect badly on PR? I would think so. It would show a poorly maintained site by HTTP standards.
If you remove pages and want them gone then 410 is the status you should give it, not 404.

[w3.org...]

kevinpate




msg:771036
 1:25 pm on May 14, 2006 (gmt 0)

> If you remove pages and want them gone then
> 410 is the status you should give it, not 404.

Absolutely, but plan on leaving the 410's in place for many many months. From what I've seen so far, the bots for G, Y and M have a bit of toruble with 410 and apparently have to see it (far too) many times before they will stop calling.

dmje




msg:771037
 5:53 pm on May 14, 2006 (gmt 0)

I seem to remember reading in Cutts' blog that Google was going to treat 404 pages the same as 410's.

I hope this is true, because on our server I dont seem have the ability to create a custome 410 page, but I do have a custom 404 page that returns the correct header etc.

How does one create a custom 410 page that works correctly.

What if a customer mistypes the url for a certain page and gets the 410 instead of 404, will this make any difference in the scheme of things?

Asia_Expat




msg:771038
 5:59 pm on May 14, 2006 (gmt 0)

Yes please... I also can't seem to produce 410 pages.

BillyS




msg:771039
 6:04 pm on May 14, 2006 (gmt 0)

>>Otherwise competition could knock you out just by posting pages of bad links to your domain!

I agree. I don't think 404s should get you a penalty. I get 404s all the time where some scraper site only gets part of the URL. I'd rather return a 404 the point it somewhere else.

I'm not sure why folks are wondering about 410s - they are created the same way 404s are. I use them, but only reserve 410 for pages that existed at one time and were susequently removed.

Reid




msg:771040
 4:51 am on May 15, 2006 (gmt 0)

404's don't get you a penalty and 410 cannot be generated by a mistyped URL.
410 is generated by a specific URL that used to exist and was removed. So mistyping the URL will not produce a 410 only the correct URL (for the page that was removed) will generate a 410. A 410 means that the page did exist at one time but was removed.

404 means the page may or may not exist and this condition may or may not be permanent. file not found at this time for whatever reason, no other status code applies.

This is purely a cache control issue. Not about spam penalties.
Look at it this way.. when you are checking your outbound links (one of those rare occasions) and you find a 404 what do you do? Normally I would check it again the next day to see if it is back to 200 (maybe the server was temporarily down, maybe there was a glitch somewhere)
But if I find a 410 then I know right away that this is a permanent condition and right away I can deal with the outbound link. It's not a glitch the page is gone.

Now think of google with literally billions of pages cached. That cache has to be maintained and kept updated. The bot goes out to see if the page has been modified since the time it was last cached and if so it updates it.
It gets a 410 and knows right away the page has been removed and no longer exists. But if it gets a 404 response then what? Is the server having temporary problems? Is the connection good? was there a glitch? Does the page still exist? Is this a permanent or temporary condition?
It's not at all about the index and penalties but it is proper cache control. So google cache-control guys are looking at the cache and saying here are 200,000,000 404 pages sitting in our cache which we don't like because we want to return relevant results not 404 pages. So what do we do?
Well lets put the 404's into the supplemental index and if they come back good later then they can earn their way back into the regular index but if they persist for x-crawls or x-days then they will be treated as 410 (no longer exist).
Now in some other office another google exec is deciding how to assign PR. Ok what criteria do we use? Well some pages have minor spam penalties and are sent to the supplemental index so lets say one criteria is how many pages are in the supplemental index from this site. That's an indication of site quality since the lower quality pages end up there.
So there is a possible indirect connection between 404 and penalties. But it's a good SEO practice to help these giant indexes maintain a healthy index by returning proper status on pages. 404 is just a default (NO OTHER STATUS APPLIES) which is actually the only error status code. It is an error (or not).

internetheaven




msg:771041
 9:32 am on May 15, 2006 (gmt 0)

I dont seem have the ability to create a custome 410 page, but I do have a custom 404 page that returns the correct header etc.

I also would be interested to know if you can create a custom 410 page, I'm going to try it now anyway but if anyone knows of some good docs on the subject I'd appreciate it.

trinorthlighting




msg:771042
 2:57 pm on May 15, 2006 (gmt 0)

I try to keep my site pretty clean. Once a week I do as site:mysite.com to check on pages indexed. If I notice any 404 pages that I know of, I remove them with the google url removal tool or I 301 redirect them.

That way my customers do not get annoyedby a page not found and go away from my site.

trillianjedi




msg:771043
 3:33 pm on May 15, 2006 (gmt 0)

Custom 410 page

Should be as simple as:-

ErrorDocument 410 /path/to/mycustom410page.html

.... in your .htaccess file (assuming Apache).

TJ

Reid




msg:771044
 3:53 am on May 16, 2006 (gmt 0)

Thanx TJ - I guess a custom 410 page would be a good idea- put a link to home on it.
Using the removal tool is a good idea (as long as you don't accidentally remove your web site - then it is a very bad idea) But you still get a lot of stray requests for the page from other SE's and other caches so a 410 on the url is still a good idea. Just like posting an anouncement to all caches that the file is gone. Anyone who sets up a cache ought to know what 410 means. Maybe google is a slow process for 410 but they do have the removal tool and it will remove any file that returns 404 or 410.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved