Forum Moderators: open

Message Too Old, No Replies

What does Google do with a 404?

404's and Google

         

Visit Thailand

9:24 am on Jul 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I hope this has not been posted before I did a search but am only just realising how large this site is.

What does Googlbot do with a 404 when it gets one ?

Does it store the page in its memory and crawls it the next time ? Or does it delete it automatically ?

I am curious as websites can be down etc which would affect a crawl, and some pages which may have been main pages before may no longer be visibly linked anywhere when it returns.

nicco

11:10 am on Jul 5, 2002 (gmt 0)

10+ Year Member



I've seen on my logs that with my site, recently renewed with jsp technology, gives a lot of 404 codes with googlebot and google crawl.
I've also seen that it has recrawled the site every day for 3 days. First with googlebot spider/robot and immediatly after with a crawl10.google, crawl11.google etc .
It's really a problem for us in sitw promotion.
Bye

Visit Thailand

2:43 am on Jul 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So you are saying you think that the googlebot sees there is an error and stays with you for around 24 hours or so to see if it is server down or a temporary error or whatever ?

The reason I ask about this is that I recently changed a whole load of pages to a new directory so there are a lot of links out there that are invlaid. I am now doing 301's to redirect but as there are so many pages it is taking a long time.

Because of this I am curious as to what google does with the 404's when it tries to visit X page. Will it keep that page in it DB and crawl it on the next crawl just to check to see if it has come back or will it be deleted all together ?

Net_Wizard

6:09 am on Jul 6, 2002 (gmt 0)



After probably 2 updates, old 404s and 301s are removed from the db.

jaytierney

6:15 am on Jul 6, 2002 (gmt 0)

10+ Year Member



For reference, I took down a page on my site that was listed in google and it took nearly two months before it was finally eliminated.

mbauser2

6:49 am on Jul 6, 2002 (gmt 0)

10+ Year Member



For crying out loud, people.

If you've intentionally deleted the page, 404 is the wrong status code. You're not getting what you expect because you expect the wrong thing.

If you permanently delete a page, make your server send out a status 410 ("Gone") (or a 301 ("Moved") if you've got a replacement page with a different name). Google will get the hint in one churn of the database.

If you can't make your server send out a 410 (it's one line in .htaccess for Apache users!), the problem isn't with Google.

(Many engines recheck 404's a couple of times, to make sure they're not deleting the index entry because of a temporary error.)

Visit Thailand

6:56 am on Jul 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



mbauser I do not think you understood what this thread is about.

The point is I transferred a few thousand pages to a new directory, I had to do this. However manually doing the 301's is taking longer than I thought and so we end up with a lot of 404's mainly when engines crawl.

This is why I am asking what google does with a 404 does it drop it immediately, does it keep it and try crawling for another x number of times or what.

It is actually quite important for people trying to redesign the layout and structure of a site. Next time I have to do this I will prepare all the 401's before I move the pages. I did not do this and so am now wondering how googlebot will handle it.

It seems now that the general consensus is two months / two crawls.

mbauser2

8:45 am on Jul 6, 2002 (gmt 0)

10+ Year Member



If the filenames haven't changed, Apache can redirect the entire set with one RedirectMatch directive: [httpd.apache.org...]

nicco

9:26 am on Jul 6, 2002 (gmt 0)

10+ Year Member



For me it's impossible now to send out 301 because we're using only tomcat web server and it has not all the functions of apache.
However, google has gone back to my pages for 3 days, as I said. The pages with redirect are still in google db but i've had a great flop on positions. So I think that google, when it find redirects (mine are javascript codes with set Timeout of 10 sec.), respiders those pages for a few days and take them down on positions. Maybe the next month or two months after those pages will be removed.