Forum Moderators: open
What does Googlbot do with a 404 when it gets one ?
Does it store the page in its memory and crawls it the next time ? Or does it delete it automatically ?
I am curious as websites can be down etc which would affect a crawl, and some pages which may have been main pages before may no longer be visibly linked anywhere when it returns.
The reason I ask about this is that I recently changed a whole load of pages to a new directory so there are a lot of links out there that are invlaid. I am now doing 301's to redirect but as there are so many pages it is taking a long time.
Because of this I am curious as to what google does with the 404's when it tries to visit X page. Will it keep that page in it DB and crawl it on the next crawl just to check to see if it has come back or will it be deleted all together ?
If you've intentionally deleted the page, 404 is the wrong status code. You're not getting what you expect because you expect the wrong thing.
If you permanently delete a page, make your server send out a status 410 ("Gone") (or a 301 ("Moved") if you've got a replacement page with a different name). Google will get the hint in one churn of the database.
If you can't make your server send out a 410 (it's one line in .htaccess for Apache users!), the problem isn't with Google.
(Many engines recheck 404's a couple of times, to make sure they're not deleting the index entry because of a temporary error.)
The point is I transferred a few thousand pages to a new directory, I had to do this. However manually doing the 301's is taking longer than I thought and so we end up with a lot of 404's mainly when engines crawl.
This is why I am asking what google does with a 404 does it drop it immediately, does it keep it and try crawling for another x number of times or what.
It is actually quite important for people trying to redesign the layout and structure of a site. Next time I have to do this I will prepare all the 401's before I move the pages. I did not do this and so am now wondering how googlebot will handle it.
It seems now that the general consensus is two months / two crawls.