Forum Moderators: Robert Charlton & goodroi
I run a website that publishes large volumes of articles. Right now we have the rights to publish millions of articles indefinitely. Recently we have been given an opportunity to license a wonderful library of content, also in the millions of documents, but we may not be able to have that content on our site indefinitely.
My impression of the Web is that we all work under a basic assumption that disk space is cheap and that things placed on the web are more or less permanent. In fact, 404 is almost a bad word, a thing to be avoided at almost all cost.
This has left me concerned that while it would be wonderful to be able to make this huge library of content available on the web, I might create havoc with Google should I start doing massive 404s once the content needs to be pulled down. I am thinking that flighting the content on and off the site might help, but still I wonder whether any red flags will go up as the number of 404s go up.
Anticipating a question: no, I don't think I can unpublish and then 301 to somewhere else where the content might reside permanently (because it doesn't).
Any thoughts or suggestions most welcome!
Thanks!
That would be a safeguard against some kind of massive 404 problem with Google I think. You could even use a robots.txt disallow rule just before your remove the content, just to keep the spiders happy and not wasting their time.
You should certainly ensure that your 404 page is more than just "hard luck, mate!" - it needs to be a gateway to whatever you still have on your site ... and you need to start planning for that day as soon as possible, so that you can retain as many visitors as possible.
And if you cannot negotiate to keep the content, maybe you can negotiate a fee to refer visitors to the new site?
My thinking is that if we get it to be a constant stream rather than in fits and spurts, it is better and the engines would somehow adjust. There has to be some aspects of this out there. At one time a lot of publications would push content live and then after a period it would move to an "archived state" where you needed an account to access it. So I am sure the engines know how to handle and adjust for this, but I still worry whether it will harm my site overall.
BTW - I could probably still leave some sort of trail there, like some metadata, but I am not conviced that would be better than just 404ing the page so that it drops off of the search indices.
Thanks for the feedback!