Forum Moderators: open
Is it possible that google is attempting to verify its databbase this time around, and finally cleaning out the 404 pages. Sure seems like that to us. We are also seeing a high degree of slurp activity, which we indicated earlier seems to be a confirmation of their database also. Activity during the past month has been even greater than googles.
Comments? from those that follow this closer than me?
On a funny side note, check out [homestarrunner.com...] . Thats the funniest 404 error page Ive found so far :)
Jim
April 11, 2003:
Have to say no at this time. Have commented before on this and am cnvinced that this is now a function of freshbot. Seems that the deep bot just crawls and then during the following month, the freshbot confirms the 404 errors. Have seen this cycle over the past couple of months, and tends to be 8-10 weeks after the deep crawl for the database to shed all the old links. On the other hand these links may not be displayed. but freshbot seems to be one cycle behind the deepcrawl (maybe even 2? cycles)as to which database it is confirming to as far as 404 errors. Just our observations on what we have been seeing in our logs from a major site update last year. Also noticed that Slurp is also rehitting the 404's extensively the last 12 weeks. Yes our 404 page is correctly serving up a 404 code. In this area since the announcement with yahoo it has become repetative and frequent. Not quite sure what that means yet, but no doubt here both engines trying ways to get database current. Just my 2 cents worth.
May 20, 2003
Sounds pretty familiar to what we are reading about dominic? See questions about 2 month old database being used. Perhaps we just didn't realize what it meant at the time. With what I have seen, IMHO think google was testing out the system prior and finally implimenting. Results were improving as far as dead pages at that time, (read better database) so perhaps dominic just an extension of this? If so this is a planned event, not a "screwup" as is being implied on a number of the threads.
So just some food for thought.
All,
Most engines will do a lot better job of removing your obsolete pages if you tell them the pages are gone with a 410-Gone server response. 404 errors are defined very vaguely, and most search engines assume that there might just be a temporary problem with your site if they get a 404; they'll keep trying for a few weeks or months. This is a reasonable assumption, if you read the 404 server code description in the HTTP/1.1 RFC. So don't be mad at the engines for trying to give you a break. If you remove a resource intentionally and want it de-listed faster, then set up a 410-Gone response for it.
Another advantage of returning correct server headers is that your 404 error log will not be cluttered up with junk. A 404 error will once again be a call to immediate action, because it will indicate a real problem.
Jim