Welcome to WebmasterWorld Guest from 18.104.22.168
Matt talked about the "new" spider proxy OR cache in boston and wrote about it in his blog. Yet I have not seen any posting about it here... did I miss that thread?
I think that is an important change coming along with the bigdaddy infrastructure and may has impact on the algo itself.
If the Mediabot has fetched a page, it normally was triggered by a surfer, looking at that page. I would say, that this might fill the cache preferably FIRST with pages, which have AdSense on them. The question now is, how does the algo cruncher work with that data? If they just ADD additional pages to the cache, if they find links to those from pages in that proxy, the whole listing would shift (just as a thought here)...
However, IMHO is the strict implementation of such a proxy worth a lot of thoughts regarding the algo in the future!
Looking back over the last month+, Googlebot has indexed about the same number of pages/day, but the number of unique URLs it indexes has increased, so from that perspective, the changes all look good to me.
However, that might just be on datacentres where that old version of the index is being phased out - I do see different actions and treatment of sites across the various DCs. Maybe soon, some obvious patterns may become apparent?
matt tried hard to point out, that it is just a cache, but would YOU work differently on your computer, if I remove all caches from your system? The hard drive cache, the processor cache, etc.?
A cache, if used right (which I suspect with Google), has a tremendous impact on the way data is handled and processed.