This all sounds like a project with some potential long-term consequences.
It is meant to be. There are no risk-free decisions left in Pandaland. While most members of this forum seem to have choosen to focus on downsizing and/or improving usability, my strategy is rapid growth.
Frankly, the content sounds like rubbish and I think you're in the minority of people who like to read press releases.
More people would read high quality press releases if finding them would be as convenient as finding newspaper articles. Most of the scientific/technical/financial articles are just poorly rewritten press releases. Of course there are press releases like "We announce a press conference" or "We have a new manager", but we do a good job filtering them out.
I admire your optimism, but how do you expect high quality links to press releases or tag pages?
This thread should be about indexing, not about questioning the quality of the links or press releases. But I don't mean to be rude:
Prerequisites:
Authoritative high quality niche
As much interaction with authoritative institutions as possible
Lots of up to date contact data
1. Getting the 2k authors involved: "We published 80% of your press releases and thank you for your work. Please check if we have overseen some outdated or factual wrong press releases. Your profile picture will be shown above every article and in Google search if you want: Just upload a picture and your G+ account..."
2. Most of the press departments like to show how often their press release has been published and link out. They will be notified about all the old press releases and every single new one.
3. Sending the press departments a widget (with a link) which shows all of their press releases so they don't have to update their websites.
4. Offering very customizable press release widgets: If the website owners want a newsfeed widget about the Moon(example), the widget comes with a link to the Moon tag page and so on. In addition to the 500+ institutions I can contact 5k smaller non-commercial websites. Hundreds of them are already using my other widgets.
5. More.
Having a JS script sift through 200,000 press releases to determine whether they are related is going to probably crash many browsers.
This is a misunderstanding. Our server will handle the workload. The short term goal is testing different ways to show related articles (for the users) without totally confusing Google. Call it cloaking if you want to. My long term goal is making this additional content available to Google. If everything works well, this will happen in 1.5 years or so.