Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Pages indexed with our temporary "site update" message

         

phill2000star

11:51 pm on Jan 22, 2008 (gmt 0)

10+ Year Member



Hi all.

Recently, a website I maintain went down, as the account was frozen, and all http requests ended in a 403 page. The reason for this is that the site is hosted on a shared host, and one evening, someone requested and downloaded (successfully), a 620Kb file 4126 times, equating to 2,500Mb of data transfer. The server was utilising 90% of its web resources for this so the provider halted it.

A few hours after I reported it, the site changed from a 403 page to its original state. I had to make some modifications to prevent mass downloading of files from a certain IP (which my isp provided).

Whilst I was doing this I put a "site update" page up and directed customers to it using a header() command in a php file. This php file was called at the start of each page on the site so it always provided a page saying "sorry, we are currently updating our site. Please come back in an hour or so".

Since then I have checked our listings in google, and it seems to have indexed about an additional 100 pages (which normally exist) all saying "we are currently updating our site".

Now these pages are ones it has scheduled to crawl after finding their links within our site. So now what? Are they going to stay like that? If not how long for? And what do I do to prevent this happeing again?

Many thanks guys!

tedster

2:54 am on Jan 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google will eventually recrawl those urls and change what's in the index. But in the future, make sure that the temporary message is delivered with the proper http status code in the server headdr. It sounds like you delivered either a "302 Temporary Redirect" or maybe even a "200 OK" status.

phill2000star

2:23 pm on Jan 23, 2008 (gmt 0)

10+ Year Member



Thanks Tedster, and thankyou for moving this in to the correct forum.

Just out of curiosity, what header no. should be displayed?

404 - page not found (obviously not this)
301 - permanent redirect?
302 - temporary redirect?

What are my other options?

Many thanks!

Asia_Expat

4:21 pm on Jan 23, 2008 (gmt 0)

10+ Year Member



[w3.org...]

503 response should be used to indicate server down for maintenance.

g1smd

8:43 pm on Jan 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Anyone tested what the various searchengine bots do when they see that response?

phill2000star

9:41 am on Jan 25, 2008 (gmt 0)

10+ Year Member



Is a 503 really correct or would a 302 be better? Any definate answers on this?

Robert Charlton

10:31 am on Jan 25, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



503 is what should be used.

See my answer in this recent thread for a more complete discussion and references both to Google and w3c.org....

Google and Website Downtime - what affect on rankings
[webmasterworld.com...]

g1smd

10:07 pm on Jan 25, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Usage of 302 is very dangerous most of the time.

jd01

6:03 pm on Jan 27, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In a similar situation I successfully used a 307 (HTTP 1.1 Temporary Redirect) to a 'Hey it's broken' page, with a robots meta tag of 'noindex,nofollow,noarchive' on the 'It's broken' page.

The pages effected temporarily dropped from the index, then within a few days of putting them back up and removing the redirect they returned to where they were previously. It's the method I would use again.

I think one of the keys to success, no matter which method you decide on, is to EITHER *redirect* (do not rewrite) to the temporary page (using a 302 (Found), 303 (See Other), 307 (Temporary Redirect) which all basically = temporary --- A 302 is now actually handled by SEs according to the 303 standards AFAIK.) OR serve a custom error page, so in either case you can keep the temporary page from being indexed using a robots meta tag of 'noindex,nofollow,noarchive' on the target page (Critical IMO).

Justin

ErrorDocument Example (Place in your htaccess to prevent a re-occurrence.):
ErrorDocument 403 /forbiden.html

Note: The ErrorDocument URL *must* be a relative URL (no http://www.example.com/) or a 302 Found will be served rather than the anticipated error code, which may be why your pages were all indexed with the error message rather than being dropped as they should.

<added>
Just re-read the OP:
Make sure you set a status code in the PHP when you redirect, or it will be considered 302 Found.

$uri="http://www.example.com/its-broken.html";
header("HTTP/1.1 307 Temporary Redirect");
header("Location: $uri");
</added>

jd01

6:32 pm on Jan 27, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think I should highlight the main difference between some form of a temporary redirect and a permanent redirect, and the reason for noindexing the temporary location:

A Permanent Redirect tells a compliant UA to request the information from the new location, so when this type of redirect is implemented (EG from /old-page.html to /new-page.html) a SE will request /new-page.html directly on the next 'spidering' of a site.

A Temporary Redirect (any version) tells a compliant UA to (basically) request the information from the original location, so when this type of redirect is implemented (EG from /old-page.html to /new-page.html) a SE will request /old-page.html on all subsequent 'spiderings' and as long as the redirect is still in place the contents of the target page (/new-page.html) will be considered the information associated with the original page (/old-page.html). This will be the case until the redirect is removed and the contents of /old-page.html are restored.

So, to keep from having duplicate pages indexed from the use of multiple temporary redirects to a single location (EG /its-broken.html) it is imperative to noindex the target location of the temporary redirects.

Justin