One member stickied me today, worried that one of his sites had been banned for duplicate content. Here's the gist if the reply I sent.
With the state of flux at Google at the moment (cf. 5000+ posts in 2 weeks), I'd just be very wary of leaping to conclusions at the moment or undertaking major work based on them. Yesterday my boss rang me at 11 in the morning (Sunday) to tell me how well we were showing for our main phrase. Apart from telling him to find another religion to worship on Sundays I had to disillusion him from the idea that, as in the past, those results would be sticky, and sure enough we're back to normal this afternoon.
FWIW, my reading of GoogleGuy's recent posts (and God bless his perseverance) is that they've been trying out some new algos on an older set of data. I also suspect that this index is not just older, but possibly incomplete, especially in the area of backlinks.
The steps [webmasterworld.com] seem to be:
1) roll out the new algo, with old data, across all datacenters
2) bring in new data from more recent crawls
3) introduce new additions to the algo (hidden text, hidden links, etc.) designed to automate much of what was manual in the past.
I think we've seen the last of the monthly dance as we traditionally know it; the delay in the last two updates, combined with the introduction of freshbot, suggest that development priority at G has been focused on moving to an ongoing update/refinement [webmasterworld.com] of their SERPs.
Where does that leave PR? In the past calculation of this was the last act in the monthly update, before the update was pushed out to the surfers at www.google.com. I believe that it is yet to be factored, or factored in full, into the current SERPs, which is why so many are seeing "lost" pages and sites (along with a possibly incomplete data set). I think it will be some days (weeks?) before they sign off on the new algo and factor the backlinks in, when there will be one last, and radical, shift in some SERPs. Thereafter? Who knows? They may continue to use freshbot to update the index on a daily basis, AS WELL AS introducing regular rolling changes in the algo, and calculate PR once a month to produce something like the dance we know, or they may have other ideas - GoogleGuy has told us many times we pay way too much attention to PR.
So what might have happened to your missing site/pages?
* You may have been penalised for dupe content. I'm not sure that Googlebot does that except in extreme cases. Look at the amount of widget affiliate sites that have similar content on their widget pages and don't suffer. They survive and distinguish themselves by having unique content on the main and peripheral pages. Of course, if they HAVE unique content on the widget pages they do better [webmasterworld.com].
* You may have fallen foul of some new addition to the algo. Maybe your sites are too deeply cross-linked and those links have lost their value. Maybe hidden links that escaped up till now are being frowned on.
* For the reasons given above, maybe your site, or a sufficient amount of it, just doesn't feature in the current index, or the pages linking to you don't, or the pages linking to THEM don't, so the value of their link to you is less than it should be. All of these will iron out in the end.
I'd say chill out. Go back to the daily tasks of good SEO. Take the opportunity to eliminate any duplicate content you do have and replace it with good original stuff. Add a few new pages. Get some more inbound links, preferably one-way. Check that your outbound links are working and bona fide. Go right through your site and make sure that EVERY page is well optimised for something that people are looking for.
Look at your other methods of marketing [webmasterworld.com]. If you don't have any, get some.
And if you need to survive in the meantime, spend next month's advertising budget because, if you do all the above, next month you won't need it. So if you've still got time to spare, take a holiday, to be ready for the rush on your return.
Just my reading of what I'm seeing. Usual disclaimers apply.