We have a family of affiliate sites that are driven by data feeds from a third-party aggregator. We hit the aggregator from the server side during the page construction process, and occasionally the aggregator fails to deliver, meaning we have no data to present on the page.
Of relevance is that we represent many vendors, and each vendor may have thousands of distinct static URLs on a site. So if our aggregator drops a vendor for a few minutes or hours, every one of those pages will produce the same error. Also, new pages will arise and old pages will drop off the map during the normal course of business as a vendor shuffles its feeds.
The question is, how can we indicate these errors with the least possible impact on SERPs?
It's probably also worth noting that the traffic dynamic we typically see is a long period of very little traffic, followed by a VERY fast increase over a few weeks, followed by a crash and then almost nothing again for months. We suspect that the way we handle these errors may be responsible for our traffic crashes, but I'd be interested in other suggestions.
Our initial solution was to deliver an error page (HTTP Response Code 200) that effectively said, "Wups! No data, please try again!" The effect of this was a raft of duplicate content and, eventually, a traffic crash. We served this error page for way too long.
Recently--after the most recent crash--we tried something more sophisticated. Now a server-side data error serves the same old 200 error page to a regular browser, but a 503 (try again in 1 minute) to search spiders. I know it's working because I can see the "Network Unreachable" errors pile up in GWT.
Unfortunately, I am also seeing some de-listing. A month ago the site referenced above had no 503s and 30K pages in the Google index; now we have 266 503 errors and 6,500 pages in the index. So I question whether serving the 503s is a good idea.
Another option would be to serve a 404 to spiders, but since most of these errors are temporary I really don't want Google to de-list the page on the first error.
Eventually we plan to cache data to serve in the event of an error, but we aren't there yet. Any thoughts?
[edited by: tedster at 5:56 pm (utc) on May 26, 2010]