Forum Moderators: open
Since the domain is a few years old (parked a while), the oldest inbound leads to domain.com/. Last year it got content, promotion and lots of inbounds, a fair amount of them from PR4 and PR5 pages, all leading to www.domain.com/. DeepBot crawled domain.com and never www.domain.com. FreshBot visited daily, fetching new and updated pages both from www.domain.com and domain.com. Google didn't manage to combine the PR, domain.com/ and www.domain.com/ were ranked PR2 and PR3, while most subpages (without external inbounds) had PR4. In April/03 I redirected (permanent, 301) domain.com/* to www.domain.com/* for cosmetic reasons (Google traffic was nice but the low PR on the root index page bothered me).
DeepFreshBot continued to fetch known pages from domain.com every now and then, got the 301 and deleted the page in the index. New pages weren't crawled any more. Every second day DeepFreshBot fetches www.domain.com/robots.txt and disappears. www.domain.com/index.htm gets crawled weekly but was no longer indexed, it appears again in Esmeralda. From hundreds of subpages a dozen is left in Esmeralda, all with domain.com URLs. Except robots.txt and the root index page DeepFreshBot didn't fetch any page from www.domain.com in the last 30 days, and she (BTW is DeepFreshBot still Ms. Googlebot?) didn't visit domain.com.
So far it's not really new stuff. The process of replacing the domain.com-URLs by www.domain.com-URLs after permanent redirects lasted several months in the past, tho the dumping seems a little weird, maybe explainable with the recent changes, but still unusual.
Interesting is, that from the remaining pages in Esmeralda some third-level-pages (2 clicks from the root index) still show PR4. The 2nd-level-page linking to the 3rd-level-page is not indexed (in both Dominic _and_ Esmeralda), but obviously it passes PR to the 3rd-level-page.
Esmeralda is not yet settled, thus it's too early for observations, but 2 questions remain:
Does DeepFreshBot handle 301 redirects different from DeepBot?
If so, this kind of redirecting became dangerous and should be used on brand new domains only.
Why doesn't deliver Google 'phantompages' (no cache, not shown as backlinks, not shown using site/allinurl), which are obviously known, on the SERPs - not even the URLs?
Probably because Google's map of links is not synchronized with the content index. DeepFreshBot is not as assiduously as DeepBot, she needs a cookie-and-cold-milk-diet since junk food results in laziness ;)
One of my sites (squeaky clean of course) has been all but dropped from the new index. It was 301-redirected from a .co.uk to a .com last month too. the only pages left in the index are links to pages that are banned via roobots.txt
The only thing is - googlebot has not actually looked at most of my pages this month (although it's possible there's an error in my stats system) - it has visited the default page several times a day, but skipped almost all the other pages.
Not much i can do about t though...