Differentiating Duplicate vs. Serialized Content

I have been studying the way Google handles and indexes periodical content. In the case of our website, which consists of about 2/3 magazine format the other third static content, we have about a half dozen professionally written columns, by the same authors, with the same column name (e.g. The Widget Times presents - Joe Smith's Weekly News Column) which are replaced every week with current news. The template for each page (header, links, title (see below) ) are identical due to the mass of content we need to put out weekly, we've tried to automate as much as possible so non-html-programmers can ready the pages for publishing. No frames are ever used!
The previous week's column page (which begins as www.domain.com/directory/index.htm) is then renamed with a weekly designator such as js080203.htm and saved in the directory with a prev link, next link, domain.com home page, etc. The new week's one takes it's place as index and is set to point to the renamed one as it's prev. This way none need ever be changed once they are archived and always form a continuous chain. When someone offsite links it for refernce, it never goes away. There is also a mutually linked archive page which very briefly outlines each week's content and links to each page, for the last year's worth of content. Many of our regular readers access the past content regularly and have requested it, so we have made the archives available for searching and indexing. We even recently paid to add a sitewide search engine to help users retrieve this content since Google won't index it all.
Over the last few months I've noticed that while alltheweb and many other SEs find and index every single archive page Googlebot visits a few times a week but tends only to index the most recent issue of each column (title) and maybe a handful of past articles. At times they may even get every single column going back 1-2 months, but these later vanish and get replaced with the newer ones. Each column main page or current week's page is at least a PR4+.
I'm assuming some of this may be based on Google deciding that since the title and 95% of the links, and about 10% of the text (mostly near the top) of the page are the same, and the page has a number in the file name that it thinks they are intentional duplicate content.
Over the last few months we've tried changing each of the titles, descriptions and keyword metas to reflect and describe each week's unique content instead of having the same title as the column and author, and instead of just describing the general topic of the column, although it is significantly more work (when multiplied by the number we are doing each week), with only very minor success at getting more indexed.
I've noticed many of our competitors and daily newspapers, national magazines, etc. manage to have every single past news column indexed (even ones which incorporate a highly copied syndicated newsfeed) and searchable, even when all that changes in the title is a date. Some end up with 10's of thousands of backlinks and PR8 with many apparent duplicate columns.
I've also noticed a lot of intentional spam sites which simply list pages full of dictionary terms to reroute and even they routinely get indexed.

1. Should G be indexing archival content and doing a better job determining what is an important repeating content vs intentional spam?
2. Does anyone know of a better way of designating similar but different content (e.g. historical sports scores, stock market reports, recipes, similar products with a separate page for each, etc.) so that each page gets indexed, without starting over creating on every one? Is there maybe a Google registry for periodicals?
3. Does anyone have experience concerning the best interlinking structure to use in this scenario for the highest PR for both the most current column page and the domain home page without G thinking that one is simply spamming? Is it excessive to link 20-30 static navigation pages and other related news column home pages from every weekly article margin?
4. Some of our columns are syndicated in print and online although we are the original source. When someone else copies 90% of a page of content weekly either by permission or without. How does G decide who is the original source and who is copying?

Mike

Differentiating Duplicate vs. Serialized Content

How to avoid Google penalizing weekly columns and index archives?

MikeNoLastName

bilalak

eztrip

tribal

MikeNoLastName

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week