|How long does Google take to remove 404 pages from index?|
I have been observing from last 6 months.
One of our site generated 8000+ canonical copies due to mod_rewrite bugs so, on 3rd Dec, 2009 our site got the filter.
from 3rd Dec, 2009 to 6th March,2010 I found Googlebot collecting all 404 pages and marking them deleted by removing "cached" link along with SERP result.
from 6th March onwards, Google started its automatic process for 404 cleanup and we have noted some significant reduction of those 404 pages from Google index.
Today after 6 months, I still can find 400+ those orphan (non-cached) copies on SERP due to which our rankings got affected.
So, it's now more than 180 days since Google bot getting 404 status for those 8000 copies.
At max how much time Google need to cleanup all 404 garbage canonical copies?
It can take months and months - but once Google spiders the URL and verifies the 404, it is not hurting your rankings.
Actually all "example.com/categories/" got 400+ canonical copies at that garbage canonical url "example.com/categories/page/[\d+]/item-url-slug.html"
So, "example.com/categories/page/[\d+]/item-url-slug.html" got higher priority over "example.com/categories/"
Then we deleted (404) all those canonical URL on 6th Dec 2009. So, traffic came down to 5-8%.
I though we could be back on SERP by 6 months, but still no luck...
It's a longest ever penalty I have ever seen on internal canonical issues.
I think one plan would be to 301 instead of 404.
Another plan would be to use a canonical link rel instead of a 404.
A third plan would be to 410 instead of 404.
Personally, I would use a 301 or a canonical link relationship on the pages over removing the content. IOW: I would try to combine rather than remove, and if I had to remove I would use a 410 (permanently, purposely removed) rather than a 404 (not found, could be either temporary or permanent).
A little update on this.
This affected domain is a "subdomain" of my site.
- Now it's over 7 months.
- Remaining 404 copies left on SERP are: Aprox 130-145.
- Googlebot completely stopped crawling those 404 pages after 15th June, 2010 (exactly after 6 months)
Whenever I query "site:example.com" my all sub-domains except this penalized one are listed over there. While this subdomain goes beyond 900+. but whenever I try follwing queries on root domain, this affected domain shows up in top 10.
|site:example.com *** -asdf |
site:example.com **** -asssdsd
Whenever I query "example.com" my all sub-domains except this penalized one are listed over there. While this subdomain goes beyond 900+. but whenever I try follwing query on root domain, this affected domain shows up in top 10.
That means, Due to massive 404s Google supplement my entire sub-domain.
Now I need answer from experts.
So, now when googlebot is not crawling those 404 pages, would it come out from supplements result on time-out based thing (i.e. 8/9 months ? or Google rankings will rerank entire site like new site?