Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Danger in publishing a large number of pages temporarily

         

cj94111

11:27 pm on Aug 22, 2008 (gmt 0)

10+ Year Member



Hi-

I run a website that publishes large volumes of articles. Right now we have the rights to publish millions of articles indefinitely. Recently we have been given an opportunity to license a wonderful library of content, also in the millions of documents, but we may not be able to have that content on our site indefinitely.

My impression of the Web is that we all work under a basic assumption that disk space is cheap and that things placed on the web are more or less permanent. In fact, 404 is almost a bad word, a thing to be avoided at almost all cost.

This has left me concerned that while it would be wonderful to be able to make this huge library of content available on the web, I might create havoc with Google should I start doing massive 404s once the content needs to be pulled down. I am thinking that flighting the content on and off the site might help, but still I wonder whether any red flags will go up as the number of 404s go up.

Anticipating a question: no, I don't think I can unpublish and then 301 to somewhere else where the content might reside permanently (because it doesn't).

Any thoughts or suggestions most welcome!

Thanks!

tedster

12:00 am on Aug 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Can you publish this potentially disappearing content in a dedicated sub-directory or subdomain?

That would be a safeguard against some kind of massive 404 problem with Google I think. You could even use a robots.txt disallow rule just before your remove the content, just to keep the spiders happy and not wasting their time.

Quadrille

12:03 am on Aug 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There's certainly the potential for massive confusion, both for human visitors and for search engines. But if it has to go, then it has to go!

You should certainly ensure that your 404 page is more than just "hard luck, mate!" - it needs to be a gateway to whatever you still have on your site ... and you need to start planning for that day as soon as possible, so that you can retain as many visitors as possible.

And if you cannot negotiate to keep the content, maybe you can negotiate a fee to refer visitors to the new site?

pateluday

3:42 pm on Aug 25, 2008 (gmt 0)

10+ Year Member



Better move those articles to a sub directory so 404 is not required!

cj94111

6:21 pm on Aug 25, 2008 (gmt 0)

10+ Year Member



I could post this content in a special subdirectory, but I think the way it will work is that it will be rolling import and publish and therefore rolling unpublish. In other words, let's say that I was pushing 2,000 articles per day live and they could stay up on the site for 90 days. On day 91 I would start unpublishing at a rate of 2,000 a day (while at the same time publishing yet another new 2,000 articles at the same time).

My thinking is that if we get it to be a constant stream rather than in fits and spurts, it is better and the engines would somehow adjust. There has to be some aspects of this out there. At one time a lot of publications would push content live and then after a period it would move to an "archived state" where you needed an account to access it. So I am sure the engines know how to handle and adjust for this, but I still worry whether it will harm my site overall.

BTW - I could probably still leave some sort of trail there, like some metadata, but I am not conviced that would be better than just 404ing the page so that it drops off of the search indices.

Thanks for the feedback!

Shaddows

8:45 am on Aug 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Could you have the date us part of the subdirectory?

example.com/articles/20080826/*

eltercerhombre

10:44 am on Aug 26, 2008 (gmt 0)

10+ Year Member



What about giving the unavailable_after meta a chance?

<META NAME="GOOGLEBOT" CONTENT="unavailable_after: 25-Aug-2007 15:00:00 EST">

Then you can post your experience in the forum ;)

tedster

3:11 pm on Aug 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good idea! I forgot about that relatively new tag. It was announced a year ago in the Official Google Blog [googleblog.blogspot.com].