|
Pages indexed with our temporary "site update" message
|
phill2000star
#:3555162
| 11:51 pm on Jan. 22, 2008 (utc 0) |
Hi all. Recently, a website I maintain went down, as the account was frozen, and all http requests ended in a 403 page. The reason for this is that the site is hosted on a shared host, and one evening, someone requested and downloaded (successfully), a 620Kb file 4126 times, equating to 2,500Mb of data transfer. The server was utilising 90% of its web resources for this so the provider halted it. A few hours after I reported it, the site changed from a 403 page to its original state. I had to make some modifications to prevent mass downloading of files from a certain IP (which my isp provided). Whilst I was doing this I put a "site update" page up and directed customers to it using a header() command in a php file. This php file was called at the start of each page on the site so it always provided a page saying "sorry, we are currently updating our site. Please come back in an hour or so". Since then I have checked our listings in google, and it seems to have indexed about an additional 100 pages (which normally exist) all saying "we are currently updating our site". Now these pages are ones it has scheduled to crawl after finding their links within our site. So now what? Are they going to stay like that? If not how long for? And what do I do to prevent this happeing again? Many thanks guys!
|
tedster
#:3555304
| 2:54 am on Jan. 23, 2008 (utc 0) |
Google will eventually recrawl those urls and change what's in the index. But in the future, make sure that the temporary message is delivered with the proper http status code in the server headdr. It sounds like you delivered either a "302 Temporary Redirect" or maybe even a "200 OK" status.
|
phill2000star
#:3555815
| 2:23 pm on Jan. 23, 2008 (utc 0) |
Thanks Tedster, and thankyou for moving this in to the correct forum. Just out of curiosity, what header no. should be displayed? 404 - page not found (obviously not this) 301 - permanent redirect? 302 - temporary redirect? What are my other options? Many thanks!
|
Asia_Expat
#:3555958
| 4:21 pm on Jan. 23, 2008 (utc 0) |
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html 503 response should be used to indicate server down for maintenance.
|
g1smd
#:3556193
| 8:43 pm on Jan. 23, 2008 (utc 0) |
Anyone tested what the various searchengine bots do when they see that response?
|
phill2000star
#:3557683
| 9:41 am on Jan. 25, 2008 (utc 0) |
Is a 503 really correct or would a 302 be better? Any definate answers on this?
|
Robert Charlton
#:3557725
| 10:31 am on Jan. 25, 2008 (utc 0) |
503 is what should be used. See my answer in this recent thread for a more complete discussion and references both to Google and w3c.org.... Google and Website Downtime - what affect on rankings http://www.webmasterworld.com/google/3553474.htm
|
g1smd
#:3558220
| 10:07 pm on Jan. 25, 2008 (utc 0) |
Usage of 302 is very dangerous most of the time.
|
jd01
#:3559168
| 6:03 pm on Jan. 27, 2008 (utc 0) |
In a similar situation I successfully used a 307 (HTTP 1.1 Temporary Redirect) to a 'Hey it's broken' page, with a robots meta tag of 'noindex,nofollow,noarchive' on the 'It's broken' page. The pages effected temporarily dropped from the index, then within a few days of putting them back up and removing the redirect they returned to where they were previously. It's the method I would use again. I think one of the keys to success, no matter which method you decide on, is to EITHER *redirect* (do not rewrite) to the temporary page (using a 302 (Found), 303 (See Other), 307 (Temporary Redirect) which all basically = temporary --- A 302 is now actually handled by SEs according to the 303 standards AFAIK.) OR serve a custom error page, so in either case you can keep the temporary page from being indexed using a robots meta tag of 'noindex,nofollow,noarchive' on the target page (Critical IMO). Justin ErrorDocument Example (Place in your htaccess to prevent a re-occurrence.): ErrorDocument 403 /forbiden.html Note: The ErrorDocument URL *must* be a relative URL (no http://www.example.com/) or a 302 Found will be served rather than the anticipated error code, which may be why your pages were all indexed with the error message rather than being dropped as they should. <added> Just re-read the OP: Make sure you set a status code in the PHP when you redirect, or it will be considered 302 Found. $uri="http://www.example.com/its-broken.html"; header("HTTP/1.1 307 Temporary Redirect"); header("Location: $uri"); </added>
|
jd01
#:3559175
| 6:32 pm on Jan. 27, 2008 (utc 0) |
I think I should highlight the main difference between some form of a temporary redirect and a permanent redirect, and the reason for noindexing the temporary location: A Permanent Redirect tells a compliant UA to request the information from the new location, so when this type of redirect is implemented (EG from /old-page.html to /new-page.html) a SE will request /new-page.html directly on the next 'spidering' of a site. A Temporary Redirect (any version) tells a compliant UA to (basically) request the information from the original location, so when this type of redirect is implemented (EG from /old-page.html to /new-page.html) a SE will request /old-page.html on all subsequent 'spiderings' and as long as the redirect is still in place the contents of the target page (/new-page.html) will be considered the information associated with the original page (/old-page.html). This will be the case until the redirect is removed and the contents of /old-page.html are restored. So, to keep from having duplicate pages indexed from the use of multiple temporary redirects to a single location (EG /its-broken.html) it is imperative to noindex the target location of the temporary redirects. Justin
|