|How to take care of stale or outdated pages?|
I need some directions with how to deal with stale or outdated content. I deal with informational and news area where news and information get outdated and old.
For example, some posts deal with old technologies that has discontinued and no one ever searches for them at the current time. And if they do, it's probably "not" what they are looking for. Which cause user experience issues.
I am hesitant to remove them outright because they may have links in the past and removing them would cause 404 and potential loss of past link juice.
Merge the outdated pages is useless too because they are outdated after all.
Redirect seems bad too simply because there is no good pages to redirect them to.
By leaving them as they are, I am suspecting that some of the old pages cause ranking issues due to staleness, and people do not find them relevant.
What is the best way to deal with these pages? What are your thoughts?
Thanks in advance.
I work with one site that definitely has the kind of situation you describe - and we do nothing to articles as they age. Even after they stop getting search traffic, we just leave them untouched and the site is THRIVING in Google Search. So I'm not sure there really is any problem to solve, here.
Some people set up an "archive" folder and move old content there after a good while. I only ever did that once, many years ago, when the alternative given by the owners was just to delete the pages. The site still lost traffic for many months.
I would go with what tedster says in almost every case. The only difference I would have that doesn't seem to apply in your case is if there may be new information on the topic with substitute information present currently, and then while the current information is "not the right answer" I use a 303 See Other.
Other than some "edge cases" I think what tedster says about leaving them alone is the best advice I've seen.
An example of where I use a 303 is when there's a business location that was previously providing a service but isn't at the current time but may again in the future. In that specific situation I want to make sure if the location provides the service again I have communicated "the page redirected to is not an alternate for the information" but it's the "best answer I have right now", other than that specific case, I'd go with what tedster says.
|I am hesitant to remove them outright because they may have links in the past and removing them would cause 404 and potential loss of past link juice. |
If you've removed them on purpose, adding the appropriate redirect code, it's not a 404. It's a 410. This makes no apparent difference to No.-2-We-Try-Harder, but it's very noticeable with Google.
But if there are active desirable links to the pages, you wouldn't want to remove them anyway. Dump the ones that don't have links to them-- and then only if they're creating unwanted clutter merely by existing. Not likely, unless your internal navigation is based on mandatory links: if it's there, it must be linked-to.
Naturally I know nothing about your site but if your content is likely to become outdated I would always put the date of authorship on each article. That doesn't deal with your other issues but visitors can make their own minds up over relevance.
Correct me if I am wrong. I think it is not a case of the content getting outdated but the product described by the content is outdated and people are not searching for it as they used to earlier. The content is still valid for that product. I would also go with what Tedster says and leave it untouched, as long as the content correctly described the product as it exists now or as it last existed.
I'd be very hesitant to remove old content. I used to do SEO for some very big (and very old) newspaper sites, and we had the same debates. I always encouraged them not to delete old content. I didn't always win those battles (it's hard when an executive is saying, but we can save $x a month by reducing storage), and I couldn't quite quantify the value of keeping it in the same way.
If you do keep it, one thing you can do is make it clear that the article is old (put 'from the archives' on all pages older than a year, for example), and then have a separate navigation path that makes it clear that these are old articles you're searching / browsing.
Before Panda I would say leave the pages as is, but now I'm not sure, could these pages be later effected by the Panda algo as they drop further from popularity? Unless you intend on supporting the outdated products I am not positive you should maintain your pages.
|For example, some posts deal with old technologies that has discontinued and no one ever searches for them at the current time. |
I took this to mean things like the definitive article on how to repair your 8-track player. If it's got any useful content, one of the big-name How To sites has long since scraped it.
How can a web page be "stale" if it provides the best information on it's topic?
For example, say you have a web page that was created pre 2000 that has been used by many other sites, including Wikipedia, as the foremost reference on it's subject matter.
Now, what would you need to change about that web page, and for what purpose?
|How can a web page be "stale" if it provides the best information on it's topic? |
That's a complicated question to answer. I think the best places to start are with the patent applications and also understanding stale is not necessarily bad in all cases, but the question about why it may need to be changed or how it could be "more stale" than other documents has to do with way more to do with "what's on the page and how long it's been unchanged", cause freshness "cascades" like PageRank (for lack of a better way of explaining).
Anyway, if you really want to know the answer to your question the Hot Topics section [webmasterworld.com...] specifically the Historical Data Patent [patft.uspto.gov...] and also the associate thread here [webmasterworld.com...] are probably good places to start.
My site was hit by Panda and had to go through every posts.
Never realized that most of those "stale" posts actually contains a lot of outdated information/methods and most of the time they can be fully rewritten to include new methods.
IMHO, informational websites probably needs to update their "stale" pages while other sites like news can get away with it.
|IMHO, informational websites probably needs to update their "stale" pages while other sites like news can get away with it. |
Ah, but how is a few thousand lines of code going to discriminate accurately when humans can read it and still not interpret it, especially when skim reading?
When I say "a few thousand lines of code" I don't actually know how many lines of code Google uses, but I wrote an application a couple of years ago that was comprised of 10,000 lines of code. Most of that code was designed to compensate for human error in inputs for scoreboard and overall percentages/ratings covering a variety of usage scenarios, so I guess it could be similar. Google could be using 100,000 lines of code or more with only 5-10% of it actually being referenced in any one incident depending on which conditions are met.
So when it comes down it it, I think that expecting Google indexing to be able to tell which is good grammar or not is absurd. For example, is it going to check absolutely every word in a dictionary to see it a) exists, b) a proper noun, new terminology, abbreviation, etc., c) mispelt, or d) spelled according to which language? And then after all that is done, is it then going to pedantically grade literary composition?