Page is a not externally linkable
Webwork - 5:17 pm on Oct 22, 2008 (gmt 0)
There is "the data", which isn't redundant and isn't in multiple locations, and then there's the bot, which blindly navigates by links - starting with the premise that it must visit a link - unless no-followed - and collect "the data found there" and then process the data - to determine if it has seen the same data "elsewhere". So, no-follow or, no, follow and figure it out? A human might not have trouble "seeing" that archives, for example, can exist in a variety of settings: by author, by date, by topic AND easily deal with that, i.e., not have duplicate content issues. But, then there's the bot . . When Google decides to grow up and to enable authors, copywriters, content creators, . . whatever . . to "submit their creation" (if they choose) for proof of initial authorship then we will be a long way ahead in the resolution of the real duplicate content issue. AND, when the bot visits any given URL - to "find" the data - then it needn't choke on the fact that "it's elsewhere" when it - the bot - knows it needn't concern itself with "the other", since that's only a matter of allowing visitors to access the same data by whatever path is their preference: Do I want to scope out the site, by author, since some authors are better than others? Do I want to scope out the site by topic, since that's all I'm interested in? Etc. So, as you said - the bot can (and to my understanding does) - do a decent job of "seeing onsite duplicate content", but that's not - or should not be contextualized by anyone - as duplicate content to be worried about, as in a site being penalized "for duplicate content". On site "duplicate content" should never be a penalty trigger, at least not in the case of popular CMS that choose such an approach to data access. I have no bones to pick with anyone - the millions - who will, by their or any consensual wisdom [i]build with the bot in mind but, so far as my little peashooter of a brain is concerned, the bot needs to be able to sort things out. The world should forget about "designing for Google" and design according to whatever works for humans. IF shoddy human information architecture of popular CMSes is the order of the day so be it, until such time as - for no other or better reason than the function of the CMS itself - the CMS is redesigned. The bot could care less about you or I or our world view or reason or anything of our minds, design or creation. It unceremoniously tosses off sites, without explanation, every day - even "good ones". So, I stand by the comment: Fiddlesticks. Build, as best you can, for your users. Buckworks, on the other hand, will wisely build for the bot and benefit therefrom, until that moment when the bot by some caprice or design shows Buckworks how little all her honest intelligent endeavors means to the bot. Which, one can only hope by then - should such a misfortune arise - that she will have built a stream of defensible traffic such that she, too, can say with the same unperturbed zest - "Fiddlesticks!". ;) Sorry to digress. To the OP: Listen to Buckworks. I'm just a brewing up a little tempest in a teacup of a revolution, in hopes that the monolith - Google - becomes just one of many entry points "to search" and becomes far less "important" in the scheme of things. I just see no good coming from this Google monoculture. YMMV. [edited by: Webwork at 5:36 pm (utc) on Oct. 22, 2008]
It seems today is the day that my little thing with the Google monoculture has chosen to boil a bit out from under the lid of the pot where it has been simmering. ;) pointlessly redundant pages . . identical content in multiple locations . . No comment ...