Forum Moderators: open
But Domenic is being built differently - and for the first time - we are geting an insight into how it is being built - as someone else said about 300 posts ago - "we are looking through a hole in the fence at a construction site."
Normally - we see "THE NEXT INDEX" - but ONLY AFTER it has already been built - i.e. we only got to see it getting replicated, across datacentres, and across www3 and ww2 and eventually into www. I believe that we NEVER saw it getting built - we only saw the replication process.
This time - we are seeing an index actually being built - ingredient by ingredient. I suspect the "new index recipe" is analogous like a group of Master chefs - each with a speciality - cooking up the new 'Google rolling update' and passing the final 'cake' from datacentre to datacentre:
The first chef - at www-sj - takes a new algo, tests it on some two month old data, using an old ODP Dump. Once he is happy with the tests - he then then passes it to the master spam Chef.
At www-fi, the Master Spam chef will probably test some new spam filters, (on the 'newly' algorithmically reranked old data from www-sj - which has less data) - based on recent spam reports (get those javascript clowns GoogleGuy!) who will add some new spam filters (probably the ones Matt Cutts told you about in Boston)
The www-fi chef then passes it to the master deepbot chef, who will probably add some of last months deep crawls. He will then pass it to the freshbot chef, who will stir in the recent freshbot activity, maybe pass it to an up and coming spam apprentice - who will dial in some more promising spamfilters to taste, maybe to the 'hidden text, chef for his additional comment - then to the backlink chef - who has the most important job - adding the most recently detected and calculated backlinks and PR results .... bake, test and then add the hand crafted spam permanent bans - and then - the 'special sauce' Googleguy mentioned about 350 posts ago - and then we'll have "Domenic Update Completion" - baked across 8 data centres - and live on www.
So - my advice is to FORGET the backlinks/ spam / position tests NOW etc - this is a major algo / strategy/ process change.
www-sj has - I believe - has already moved its 'work' into www-fi today - but not in the traditional 'replicate a new index' fashion - but more like the chef handing on his 'creation' to the next specialist to add to. If you are foccusing on www2 and ww3 - then have another lemonade. You are missing it.
Checking the IP addresses of the Google Datacentres (not nominal DNS recores - but by doing a traceroute)- indicate that www2 and www3 and www-sj have been pointing at the same identical IP address for the past few days.
Do your own research - draw your own conclusions, CHECK THE IPs addresses - and follow along! Maybe the order I've listed this process in is wrong - but I think the principle of a decentralised 'incdex building' process - rather than datacentres for a pure delivery & data replication process - is the key to a new, more scaleable, and more powerful Google!
Chris_D
Still searching for more clues.
PS Hey Googleguy - If the index is now a " brave new world of continuous updates, occuring across multiple datacentres, who will each add specialist skills" - will you please confirm this hypothesis so that I can get some sleep?
<Edited to fix obvious typos!>
Where exactly are the meat coming from? I mean the data they are using. Cause you mentioned they are using somewhat 2 month old data in combined with some fresh bot data, then why there are some older websites like mine is compeletely missing in the action? The only explanation I have now is the spam filter shef has cooked the meat (data) already?
What's your oppinion?
I beleive it will look like they way we think it should.
My bet is sometime within the next 48hrs we will see it happen.
*yo big shout out to Googleguy!* - wassssuuuuup!
Glad that helped. If this theory is right - they the only way to do it is to start with an old version, and then add more recent stuff back in later (eg last deep crawl) - in order to be able to compare it to the 'current' www. That would be the only way to see if eg new spam filters, applying a different backlink strategy (and lets face it - FAST has a much better backlink system) to see if it works on a full index - and how that will compare to the 'current' one.
dididudu - We are watching them in the 'kitchen' at work - my 'best guess' on the data we started seeing in www-sj(based on DMOZ entries, sites I know well etc) - is that it was about 6 - 8 weeks old - thats not an absolute fact. But we know that it isn't all the data - we all know that pages from the last deep crawl aren't necessarily there yet. I think that as the data is 'similar but different' in www-fi and yet the other half a dozen datacentres are just serving up www results - the data looks like its moving from datacentre to datacentre - and each one is applying their own 'added value'. As far as when the last deepbot data/ freshbot data gets 're added' to the pot - and when links get correctly counted - Your guess is as good as mine!
Best
Chris_D
Really nice post... You wrote a very good post the other day about this also and I complimented you on it. However, a lot of people don't seem very interested in analyzing what is going on. They are too busy posting every 10 min on WW crying about how their keyword dropped or is missing. Oh and don't forget the posts where people are yelling at GG for answers. Things for me on SJ are looking a bit negative, however, I'm trying to understand what is happenning on SJ and why and how I can improve for the next update? Has anyone find out anything that is of use to us in the future? And thanks again Chris_D for that nice (long) post.
Catnip
:)