Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Prevent Google indexing while my DB is down?

         

sweethilit

10:08 pm on Oct 27, 2006 (gmt 0)

10+ Year Member



Hi,
Recently I had an annoying incident:
my DB fall and when this happens there is a default message "could not connect to..." while this happened, googlebot indexed some pages with this message...! one of my Major pages dropped from No5 to ~ 50 in ranking (obviously the cached page is the "no connection" message).

So what I was thinking to do is to inform G not to index the site while there is no DB connection by outputting some meta tag or what ever if there is no connection.
I googled if and how to do this with no success.

Suggestions will be appreciated!

jdMorgan

2:38 am on Oct 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you can detect the DB failure programatically, you can try returning a 503-Service Unavailable [w3.org] response to robot requests for your pages. Note that you should also provide the Retry-After header mentioned below with this 503 response if possible.

From the HTTP/1.1 specification [w3.org]:

The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.

Note: The existence of the 503 status code does not imply that a server must use it when becoming overloaded. Some servers may wish to simply refuse the connection.

Jim

sweethilit

7:35 am on Oct 28, 2006 (gmt 0)

10+ Year Member



Thanks Jim! exactly what I was looking for!

sweethilit

12:52 pm on Oct 28, 2006 (gmt 0)

10+ Year Member



Just want to be sure, is the use of 503 header is a tested and known way to stop G from crawling the page? May be it is better to set 404?

I'm curios how everyone dealing with this DB problems.

Many thanks!

g1smd

2:56 pm on Oct 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would not use a 404.

404 says it is not found rather than it is unavailable.

If Google doesn't handle 503 then they should.

jdMorgan

3:33 pm on Oct 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One thing that helps to understand HTTP response headers [webmasterworld.com] is to realize that they are heirarchical: That is, if a client does not understand a 503 response, then it should treat it as a 500 response. 500-Server Error being the 'generic' response, the robot should know to try again later.

Full disclosure: It's impossible to know if all search engine robots will treat all responses properly all the time. All that we as Webmasters can do is to take care of our end on the server, and hope that if a client does not currently handle all responses properly, that over time, it will be upgraded to do so. Most of the major search engines do handle responses correctly, at least in the most general way; For example, they are programmed to come back and retry requests that previously resulted in a 404-Not Found many, many times, sometimes to the annoyance of the Webmaster, in order to avoid dropping pages from their indexes due to temporary errors.

In the case described here, a 503-Service Unavailable is the correct response.

Jim

sweethilit

7:04 am on Oct 29, 2006 (gmt 0)

10+ Year Member



Thanks again Jim and g1smd, I already changed the 200 OK with the message "could not connect".. to 503, just wanted to be sure.

Now I am curios to see whether the pages will return to the same ranking. My hunch is that they won't, as there is kind of "inertia" factor in G algorithm, meaning that in order to rank better than No5 for example, you have to have better ranking than No5, even if you where that No 5 few days ago...

Hope I was making sense there :)

sweethilit

7:47 am on Oct 29, 2006 (gmt 0)

10+ Year Member



Well, if it's interest someone, I see ranking coming back that was relay relay fast! G crawled the site on the 24th (could not connect problem), some pages where crawled on the 27th. one of the Major pages of mine that dropped from 5th to ~50 now ranks on some DCs 7th

motorhaven

2:17 am on Oct 31, 2006 (gmt 0)

10+ Year Member Top Contributors Of The Month



We send a 503 header each night while doing a mysqldump to backup the server. The database is large and it takes 10-15 minutes to dump.

The technique works well. We generally see Google attempt to refetch the page later.