Forum Moderators: open

Message Too Old, No Replies

Chefs rolling the new update and index baking....

Extrapolation of the Domenic theory.......

         

Chris_D

2:04 pm on May 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Historically - we got to see monthly (approx) Google updates rollout.

But Domenic is being built differently - and for the first time - we are geting an insight into how it is being built - as someone else said about 300 posts ago - "we are looking through a hole in the fence at a construction site."

Normally - we see "THE NEXT INDEX" - but ONLY AFTER it has already been built - i.e. we only got to see it getting replicated, across datacentres, and across www3 and ww2 and eventually into www. I believe that we NEVER saw it getting built - we only saw the replication process.

This time - we are seeing an index actually being built - ingredient by ingredient. I suspect the "new index recipe" is analogous like a group of Master chefs - each with a speciality - cooking up the new 'Google rolling update' and passing the final 'cake' from datacentre to datacentre:

The first chef - at www-sj - takes a new algo, tests it on some two month old data, using an old ODP Dump. Once he is happy with the tests - he then then passes it to the master spam Chef.

At www-fi, the Master Spam chef will probably test some new spam filters, (on the 'newly' algorithmically reranked old data from www-sj - which has less data) - based on recent spam reports (get those javascript clowns GoogleGuy!) who will add some new spam filters (probably the ones Matt Cutts told you about in Boston)

The www-fi chef then passes it to the master deepbot chef, who will probably add some of last months deep crawls. He will then pass it to the freshbot chef, who will stir in the recent freshbot activity, maybe pass it to an up and coming spam apprentice - who will dial in some more promising spamfilters to taste, maybe to the 'hidden text, chef for his additional comment - then to the backlink chef - who has the most important job - adding the most recently detected and calculated backlinks and PR results .... bake, test and then add the hand crafted spam permanent bans - and then - the 'special sauce' Googleguy mentioned about 350 posts ago - and then we'll have "Domenic Update Completion" - baked across 8 data centres - and live on www.

So - my advice is to FORGET the backlinks/ spam / position tests NOW etc - this is a major algo / strategy/ process change.

www-sj has - I believe - has already moved its 'work' into www-fi today - but not in the traditional 'replicate a new index' fashion - but more like the chef handing on his 'creation' to the next specialist to add to. If you are foccusing on www2 and ww3 - then have another lemonade. You are missing it.

Checking the IP addresses of the Google Datacentres (not nominal DNS recores - but by doing a traceroute)- indicate that www2 and www3 and www-sj have been pointing at the same identical IP address for the past few days.

Do your own research - draw your own conclusions, CHECK THE IPs addresses - and follow along! Maybe the order I've listed this process in is wrong - but I think the principle of a decentralised 'incdex building' process - rather than datacentres for a pure delivery & data replication process - is the key to a new, more scaleable, and more powerful Google!

Chris_D
Still searching for more clues.
PS Hey Googleguy - If the index is now a " brave new world of continuous updates, occuring across multiple datacentres, who will each add specialist skills" - will you please confirm this hypothesis so that I can get some sleep?

<Edited to fix obvious typos!>

albert

2:30 pm on May 9, 2003 (gmt 0)

10+ Year Member



Chris_D,

really nice post, thank you.

So in the future we will have threads like
"current chef in duty: master Spam"

And if it's cooked always the same way - well, we might be able to forecast from the ingredients put in how it will taste finally.

dididudu

11:13 pm on May 9, 2003 (gmt 0)

10+ Year Member



LOL, this is the most interesting post I have read so far. Nicely written! I hope they can beat the iron shefs. :p

SlyGuy

11:19 pm on May 9, 2003 (gmt 0)

10+ Year Member



One thing is for sure, I'm hungry now.

Nice post, Chris.

- Chad

dididudu

11:22 pm on May 9, 2003 (gmt 0)

10+ Year Member



Thought over the cooking theory again, and I have a question for you Chris_D:

Where exactly are the meat coming from? I mean the data they are using. Cause you mentioned they are using somewhat 2 month old data in combined with some fresh bot data, then why there are some older websites like mine is compeletely missing in the action? The only explanation I have now is the spam filter shef has cooked the meat (data) already?

What's your oppinion?

jojojo

11:40 pm on May 9, 2003 (gmt 0)

10+ Year Member



after reading that article about how many people work at Google with PHd's and their last profitable 7 quarters etc I feel more comfortable swallowing your interpretation and dreaming about a magical end result that will look the way most here think it "SHOULD" look...

I beleive it will look like they way we think it should.

My bet is sometime within the next 48hrs we will see it happen.

*yo big shout out to Googleguy!* - wassssuuuuup!

annej

2:35 am on May 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Chris, for the first time I see why it would make sense to start with an old index. Thanks

Chris_D

5:08 am on May 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Annej

Glad that helped. If this theory is right - they the only way to do it is to start with an old version, and then add more recent stuff back in later (eg last deep crawl) - in order to be able to compare it to the 'current' www. That would be the only way to see if eg new spam filters, applying a different backlink strategy (and lets face it - FAST has a much better backlink system) to see if it works on a full index - and how that will compare to the 'current' one.

dididudu - We are watching them in the 'kitchen' at work - my 'best guess' on the data we started seeing in www-sj(based on DMOZ entries, sites I know well etc) - is that it was about 6 - 8 weeks old - thats not an absolute fact. But we know that it isn't all the data - we all know that pages from the last deep crawl aren't necessarily there yet. I think that as the data is 'similar but different' in www-fi and yet the other half a dozen datacentres are just serving up www results - the data looks like its moving from datacentre to datacentre - and each one is applying their own 'added value'. As far as when the last deepbot data/ freshbot data gets 're added' to the pot - and when links get correctly counted - Your guess is as good as mine!

Best

Chris_D

Catnip

5:30 am on May 10, 2003 (gmt 0)

10+ Year Member



Chris_D,

Really nice post... You wrote a very good post the other day about this also and I complimented you on it. However, a lot of people don't seem very interested in analyzing what is going on. They are too busy posting every 10 min on WW crying about how their keyword dropped or is missing. Oh and don't forget the posts where people are yelling at GG for answers. Things for me on SJ are looking a bit negative, however, I'm trying to understand what is happenning on SJ and why and how I can improve for the next update? Has anyone find out anything that is of use to us in the future? And thanks again Chris_D for that nice (long) post.

Catnip
:)