Forum Moderators: Robert Charlton & goodroi
Recently, a website I maintain went down, as the account was frozen, and all http requests ended in a 403 page. The reason for this is that the site is hosted on a shared host, and one evening, someone requested and downloaded (successfully), a 620Kb file 4126 times, equating to 2,500Mb of data transfer. The server was utilising 90% of its web resources for this so the provider halted it.
A few hours after I reported it, the site changed from a 403 page to its original state. I had to make some modifications to prevent mass downloading of files from a certain IP (which my isp provided).
Whilst I was doing this I put a "site update" page up and directed customers to it using a header() command in a php file. This php file was called at the start of each page on the site so it always provided a page saying "sorry, we are currently updating our site. Please come back in an hour or so".
Since then I have checked our listings in google, and it seems to have indexed about an additional 100 pages (which normally exist) all saying "we are currently updating our site".
Now these pages are ones it has scheduled to crawl after finding their links within our site. So now what? Are they going to stay like that? If not how long for? And what do I do to prevent this happeing again?
Many thanks guys!
See my answer in this recent thread for a more complete discussion and references both to Google and w3c.org....
Google and Website Downtime - what affect on rankings
[webmasterworld.com...]
The pages effected temporarily dropped from the index, then within a few days of putting them back up and removing the redirect they returned to where they were previously. It's the method I would use again.
I think one of the keys to success, no matter which method you decide on, is to EITHER *redirect* (do not rewrite) to the temporary page (using a 302 (Found), 303 (See Other), 307 (Temporary Redirect) which all basically = temporary --- A 302 is now actually handled by SEs according to the 303 standards AFAIK.) OR serve a custom error page, so in either case you can keep the temporary page from being indexed using a robots meta tag of 'noindex,nofollow,noarchive' on the target page (Critical IMO).
Justin
ErrorDocument Example (Place in your htaccess to prevent a re-occurrence.):
ErrorDocument 403 /forbiden.html
Note: The ErrorDocument URL *must* be a relative URL (no http://www.example.com/) or a 302 Found will be served rather than the anticipated error code, which may be why your pages were all indexed with the error message rather than being dropped as they should.
<added>
Just re-read the OP:
Make sure you set a status code in the PHP when you redirect, or it will be considered 302 Found.
$uri="http://www.example.com/its-broken.html";
header("HTTP/1.1 307 Temporary Redirect");
header("Location: $uri");
</added>
A Permanent Redirect tells a compliant UA to request the information from the new location, so when this type of redirect is implemented (EG from /old-page.html to /new-page.html) a SE will request /new-page.html directly on the next 'spidering' of a site.
A Temporary Redirect (any version) tells a compliant UA to (basically) request the information from the original location, so when this type of redirect is implemented (EG from /old-page.html to /new-page.html) a SE will request /old-page.html on all subsequent 'spiderings' and as long as the redirect is still in place the contents of the target page (/new-page.html) will be considered the information associated with the original page (/old-page.html). This will be the case until the redirect is removed and the contents of /old-page.html are restored.
So, to keep from having duplicate pages indexed from the use of multiple temporary redirects to a single location (EG /its-broken.html) it is imperative to noindex the target location of the temporary redirects.
Justin