homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

The crawl caching proxy
Did I miss a thread?

WebmasterWorld Senior Member 10+ Year Member

Msg#: 34239 posted 6:40 pm on May 9, 2006 (gmt 0)


Matt talked about the "new" spider proxy OR cache in boston and wrote about it in his blog. Yet I have not seen any posting about it here... did I miss that thread?

I think that is an important change coming along with the bigdaddy infrastructure and may has impact on the algo itself.

If the Mediabot has fetched a page, it normally was triggered by a surfer, looking at that page. I would say, that this might fill the cache preferably FIRST with pages, which have AdSense on them. The question now is, how does the algo cruncher work with that data? If they just ADD additional pages to the cache, if they find links to those from pages in that proxy, the whole listing would shift (just as a thought here)...

However, IMHO is the strict implementation of such a proxy worth a lot of thoughts regarding the algo in the future!




10+ Year Member

Msg#: 34239 posted 10:28 pm on May 9, 2006 (gmt 0)

I don't think this will affect the algo directly at all, in that a page only accessed only by Mediabot before will now enter the main index. It's simply that if a page was previously getting fetched by Mediabot and the main Googlebot, it will now only get fetched once. That's as I understand it, based solely on what Matt has said.

Looking back over the last month+, Googlebot has indexed about the same number of pages/day, but the number of unique URLs it indexes has increased, so from that perspective, the changes all look good to me.



WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

Msg#: 34239 posted 12:23 am on May 10, 2006 (gmt 0)

I think there is a disconnect between what is spidered and what is indexed, or what is indexed and fails to be updated with newer information that is available at the website, but which never seems to make it into the index.

However, that might just be on datacentres where that old version of the index is being phased out - I do see different actions and treatment of sites across the various DCs. Maybe soon, some obvious patterns may become apparent?


WebmasterWorld Senior Member 10+ Year Member

Msg#: 34239 posted 12:33 am on May 10, 2006 (gmt 0)

I agree that the crawler cache is not the biggest news, but given the fact, that google has majorly changed infrastructure with bigdaddy, I wonder if that cache does not have more impact on the ranking on the long run, than we all think.

matt tried hard to point out, that it is just a cache, but would YOU work differently on your computer, if I remove all caches from your system? The hard drive cache, the processor cache, etc.?

A cache, if used right (which I suspect with Google), has a tremendous impact on the way data is handled and processed.


Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved