| 6:17 am on Jul 17, 2010 (gmt 0)|
One of my sites has about 50% of the pages with the "noindex,follow" robots attribute. They are not visible in the SERPs, but experience is that link juice just happily flows. That may not be fully representative for links which "fall out" of the index, but the juice graph of Google consists of definitely more than just visible URLs.
| 4:24 pm on Jul 17, 2010 (gmt 0)|
Thanks for that, lammert - it certainly does support my idea.
I started thinking in this direction because of baberjaved's question in another thread: Removing Low Ranked, Un-useful Pages - worth it? [webmasterworld.com] and Matt Cutts' frequent advice to let googlebot crawling and indexing be as open as you possibly can.
Add that to the difficulty in getting a stable number for indexed pages [webmasterworld.com], and it seems to me that what goes on with Google is a lot more than what we see, or even what we've guessed.
URLs that are in Google's back end bur not publicly visible could be Google's own version of dark matter [en.wikipedia.org].
| 5:08 pm on Jul 17, 2010 (gmt 0)|
He, he, he. This is what you're thinking about at 2:05 in the AM!?!
As part of that dark matter I still firmly believe that the supplemental index still exists and that a URL that "falls out" of the main index has simply "gone supplemental" as we used to say. And following on lammert's comment I'll even posit a NOINDEX index. And a 404 index. And then a...whatever. Yeah, plenty of stuff we can't see.
Bottom line though, I think Google would have to consider all of the URLs it knows about that are capable of passing link juice in its iterations simply because if it did not it would be creating innumerable dead ends where the juice couldn't flow back into the system. I ain't no math guy, but after umpty-ump iterations wouldn't the PR of the entire web then be effectively reduced to zero?
Just yesterday I reread Saul Hansell's 2007 NYT article Google Keeps Tweaking Its Search Engine [nytimes.com]. I'm still quite astounded that search professionals and academics were astounded at what Google was doing two years ago. Guess my hope is actually not understanding G, simply surviving in spite of my not understanding it.
| 6:33 pm on Jul 17, 2010 (gmt 0)|
Even though such pages aren't in the SERPs index, they could still be part of the underlying set of data that is fed into Google's algorithm to calculate the SERPs. This underlying database may have even expanded after the transition to Caffeine.
| 9:39 am on Jul 19, 2010 (gmt 0)|
Google compartmentalizes everything. It all runs together but is stored separately so I imagine pagerank is no different. In fact I would be willing to bet that google iterates several different versions of pagerank. Is your sites social media value on fire via Twitter/Facebook (all nofollow links) but besides that you have no inbound links? No problem, influence has its own pagerank.
If more people's browsers visit your site, and report to Google that they visited and stayed a while, your pagerank will likely increase even with zero inbound links(outside social sites using nofollow). Can't test it but I know it's true. Popularity has value that may not equal number/quality of inbound links, Google knows this and it's weighed and measured accordingly.
One other thing to consider, Google serps fluctuate quite a bit on our end but they may not on their end. The various data centers our serps come from likely draw data from a main data center but at different intervals. If that's the case we can't see the current status of our sites webgraphs, at least not in real time.
| 2:39 pm on Jul 19, 2010 (gmt 0)|
|If more people's browsers visit your site, and report to Google that they visited and stayed a while, your pagerank will likely increase even with zero inbound links(outside social sites using nofollow). Can't test it but I know it's true. Popularity has value that may not equal number/quality of inbound links, Google knows this and it's weighed and measured accordingly. |
I don't think we should start muddling terms here.
Yes, there are different components of the algorithm and they all have different ways of assessing a document's value, but we shouldn't refer to them all as "PageRank," especially since Ted's question refers specifically to Google's web graph and how links are valued or not.
| 1:20 am on Jul 20, 2010 (gmt 0)|
I believe they do not fall out of the web graph.
We have pages that you can only get to from certain pages that have disappeared from google yet these pages down further still retain pr and rankings without any other external links.
So those pages that have been "ghosted" as we call them around here do still pass some form of juice and are being stored someplace inside G.
| 6:11 pm on Jul 20, 2010 (gmt 0)|
>One of my sites has about 50% of the pages with the "noindex,follow" robots attribute.
Google has said time-and-time again that "noindex" means they will not show that page in any search result, but that is will still be used in pr and rank calculations. Matt Cutts said that here about circa 2003. Of course links on pages - even NOT in the index count. If GoogleBot can download it, then it is going into the algo calcs. The only way you are keeping that out of the index is to not let googlebot see it.
a 404 link is an entirely different discussion.
| 11:42 am on Jul 21, 2010 (gmt 0)|
I tested recently and found that anchor text from 'noindex,follow' pages is certainly credited within the same site.