Forum Moderators: open

Message Too Old, No Replies

SERIOUS Google update algo analysis thread (Dominic)

NO whining or cheering about how your site is doing in this one.

         

rfgdxm1

6:21 am on May 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is a continuation of an idea for a thread that I started a few updates back. The topicality is listed below, and the expectation is that this thread will be restricted to just that. In the case of this Dominic update, GG has stated that other aspects of the update will be rolled out as the update develops. Thus, for this update it is possible that the observations made early on will not hold true by the end of the update. This is OK, because if patterns like this hold true for later updates, members here can use the search feature to find this thread and see how past updates developed.

----

I'm starting this thread because another member suggested such would be a good idea because the main Google update thread is cluttered with posts like "OMG, I've been dropped in the new index!" and "Yippee, I'm now #1 on a key SERP". This thread is ONLY for serious, generic discussion of changes that you are observing with the new algo in this update. As in things like "Looks to me like PR is less important this month, and anchor text of inbound links counts more.", etc. How your site is doing has no relevance here unless you can explain why you think so in terms of a general algo update.

wackmaster

7:01 pm on May 7, 2003 (gmt 0)



Hmmm.

Been looking at five different cat's ... -sj versus -fi ... three of the cat's we're in, two we're not. Four of five show more consistent quality results in -sj. So from here, looks like the spam filters no on yet fully in -fi.

In one of these cat's in particular, a large, quality site fell to page two in -fi; OK in -sj.

For what it's worth, we also see more resemblance to the current www in -sj. If Google is using www as some sort of benchmark, that would suggest -sj is still the place that best reflects G's direction.

But, if G not happy with current SERP's, then perhaps -fi reflects the newer algo better?

Yikes.

stevegpan2

7:23 pm on May 7, 2003 (gmt 0)

10+ Year Member



i do no see any of my new links on www or sj or fi

wackmaster

7:35 pm on May 7, 2003 (gmt 0)



Additional thought: Seems to me from what we're looking at so far that Google is taking harder to *interpret content* and factor it into their algo's.

Others have noted that single keywords are dropping and multiple keywords rising and this might reflect such an attempt.

If so, quite an undertaking. IMHO, Google may need quite a bit more tweaking if "-fi" is any indication of such an attempt.

Thoughts anyone?

Anon27

2:16 am on May 8, 2003 (gmt 0)

10+ Year Member



I have 2 fresh sites which are now in www. They are 2-3 days old.

why2kit

2:44 am on May 8, 2003 (gmt 0)

10+ Year Member



anon27 - when did freshie find those sites?

Anon27

2:58 am on May 8, 2003 (gmt 0)

10+ Year Member



why2kit:

Stats are bad at this time, but it looks like either early yesterday (May 06) EST time or the night before. The server I am using for those sites might be on GMT, EST or PST. I can not tell yet.

I hope this helps

BikeMan

5:11 am on May 8, 2003 (gmt 0)

10+ Year Member



Links for my sites from Google Directory and certain other sites have disappeared while they appear for competing sites.

These links have been in place for a couple of months. Looks like its not factoring in recent links.

Chris_D

6:27 am on May 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Back here, [webmasterworld.com...] in Message 160, Swerve noted :

"From this exchange, it seems safe to make the follow assumptions (IMO):

1. -sj does does not contain the complete results from the deep crawl.
2. Pages from the deep crawl will be added to the index in coming days.
3. As pages are added, backlinks will be added.
4. (At least some) SPAM filters have not been applied to the index yet. Thus -sj is currently more spammy than the final index.
5. Given the preceding 4 assumptions, the SERPs of the final index could be drastically different from what we currently see on -sj (IMO)."

Swerves post was based upon GoogleGuys responses in this thread
[webmasterworld.com...]

An excellent summary of all the clues Swerve!

I've been doing a fair amount of research for 2 days - and the 'process' for update Dominic is totally different to what we have seen previously. The whole update process is changing.

FORGET the '-sj SERPS' per se - look at the algorithm that has derived the results. READ Swerves summary of GoogleGuys posts.

GoogleGuy has already given us a heads up - reread the links orginally provided by Swerve.

What you can do right now is see how the -sj results are derived - and learn some very valuable stuff. Ignore the spam - assume it will get killed by one of more spam filters, which are yet to be applied (see 4 above).

Just forget that your backinks are wrong - see 3.
Just forget that your deepcrawled April pages aren't in the -sj index - see 1 & 2.

And I'll add to what Bikeman said - yes - the -sj index is based on an old (previous) ODP dump - whereas the 'current' www is from a more recent ODP dump. It will get changed later - add that as 6. above.

So you can't analyse 'Dominic' the way everything else has been previously analysed at this point in an update cycle - THERE IS NO FINAL INDEX/SERPS TO VIEW AS YET

THIS one is being built differerently - and for the first time - we are geting an insight into how it is being built. Normally - we see it after its built - we see it getting replicated. This time - we are seeing an index actually being built - ingredient by ingredient. Don't waste the opportunity!

Lets face it - how do you keep scaling/ sorting/ analysing/ ranking a database on some 16,000 servers, in 7 datacentres, with over 3 billion webpages? I don't know - but Google is doing it as we speak.....

My advice right now - look for the patterns on the 'good' sites, ranking highly in the -sj index. Ignore the spam - it gets culled later. Are the good high ranking sites the same as the 'good' sites in the current www index? Ignore the spam - look at the good sites......

Chris_D

[edited by: Chris_D at 6:46 am (utc) on May 8, 2003]

whats up skip

6:45 am on May 8, 2003 (gmt 0)

10+ Year Member



I still think you cannot analyise the new algo until after the update has finished.

If you think the update has finished - think again.

Results on www3 are still very different to www.google.com or www.google.co.jp. There is even listings from the DMOZ that have been removed months ago still showing up.

shaadi

7:05 am on May 8, 2003 (gmt 0)

10+ Year Member



>If you think the update has finished - think again.

Me thinks that there is no movement on www. just some fresh bot listings - thats it. For me the update has not even started on www.

correct me if i am wrong.

Catnip

7:11 am on May 8, 2003 (gmt 0)

10+ Year Member



Chris_D nice post thanks for taking the time to collect all that information out of the last 500 posts.

Catnip
:)

rfgdxm1

7:13 am on May 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



GG himself said the dance had begun. However, the dance is a process. And, this dance is by no means over. This dance is so unusual because it is taking so long.

jojojo

7:45 am on May 8, 2003 (gmt 0)

10+ Year Member



ChrisD's post says it all - I think all the paranoia cmoes from this apparent 'change' we are going to see... it just has never happened before.

Chris_D

7:54 am on May 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



'Whats up skip' said: "If you think the update has finished - think again"

?I'm puzzled by that statement?

Who thinks its over? The time its taking indicates that something very different is happening with the 'Dominic index update'. The whole process has changed.

The google dance tool shows:

7 datacentres with 681,000 links to Yahoo.com - and only ONE datacentre has 384,000 links to yahoo - and thats www-sj

Hence - everyone is focussing on -sj

It also shows that www2 and www3 are showing 384,000 yahoo links - and www is showing 681,000 links.

If you do some analysis on where the data is coming from - which rfgdxm1 alluded to earlier - it gets a little easier to see why...

www-sj 216.239.47.166
www2 216.239.47.166
www3 216.239.47.166

whereas:
www 216.239.48.242
www-ex 216.239.47.2
etc.

Does that seem a little unusual that wwww2 and www3 and www-sj are all returning the same IP address? Its the same index version - The process has changed. It not www2 and www3 being different anymore, relecting a completed different index and then being integrated anymore - they're building this index right in front of us now....

eSo www.google.com and 7 out of 8 datacentres are still showing a 'current' index - with little freshbot bits in it - but www2 and www3 are showing some 'testing', based on some 'older' data, older DMOZ, spam filters turned off etc.

Chris_D

shaadi

8:09 am on May 8, 2003 (gmt 0)

10+ Year Member



Chris_D you got it right! I was trying to say the same thing still curious why that statement: "If you think the update has finished - think again"

j_h_maccann

9:37 am on May 8, 2003 (gmt 0)

10+ Year Member



I still think you cannot analyise the new algo until after the update has finished.

But what if "the update" never finishes again?

A very important goal in dealing with huge volumes of data is to find a way to avoid "batch" processes that require processing all the data to reach a new version. It's been common opinion that Google was aiming to abolish the monthly update in favor of a constantly-rolling update--this may in fact be required in order to grow further.

When you have figured out how do it, how would you make that transition? You would keep a baseline index, and then you would start keeping an "update buffer" with the results of all crawls, directories, etc. (and some minimum buffer size may be needed for practical updating). Then you would start applying the contents from the head of the "update buffer" to your baseline index, adding new sites, deleting others, and running the processes to calculate and propagate changes (links, PR, ...) to all sites (while you continue to crawl, refilling the update buffer). Once you get this continuous update started, you never stop again.

The process by which a new state of the index is copied to tens of thousand of machines in multiple datacentres is a separate question, and rather less interesting ("just operations"). Much of the speculation about different centres has failed to take into account both how flexible DNS resolution can be and how creatively "load balancing" can be used among a great number of servers at each centre, so the reports here are often faulty. The only interest, anyway, was to get an early sighting of "next month's" index. In a world of continuous updates, that interest will have gone away.

(Just a hypothesis, but it does explain why people observe that "this update is different in so many fundamental ways from any we have seen before!" It's not an update: it's the end of discrete updates.)

UK_Web_Guy

9:54 am on May 8, 2003 (gmt 0)

10+ Year Member



Is anyone seeing any PR changes on sj or fi -

Or is this one of the things Google have not yet re-calculated yet?

minivip

10:37 am on May 8, 2003 (gmt 0)

10+ Year Member



great suggestion j_h_maccann. it seems as if google is on the way to implement incremental updates. thus it makes sense to use a modified algorithm. it seems as if content and backlinks/PR are handled more separate now. at least this could explain the conspicuous drops and jumps in the sj rankings.

wackmaster

1:21 pm on May 8, 2003 (gmt 0)



< THIS one is being built differerently ... Normally - we see it after its built - we see it getting replicated. This time - we are seeing an index actually being built - ingredient by ingredient. Don't waste the opportunity! ... Ignore the spam - look at the good sites... >

Chris_D - Hat's off to you, nicely done.

We're looking through the peephole of the Google construction site!

Kirby

1:48 pm on May 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Chris_D and J_H have nailed it. Excellent synopsis and conclusions. GG has inferred as much since the beginning of the changes noted on -sj.

annej

3:07 pm on May 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have an interesting situation to watch since I have two sites on the same topic. One is optimized for widgeting history and the other widget history. When I use the single search word 'widgeting' on www the homepage of the 'widgeting' site is 11th while the 'widget' site's homepage is 22nd. But when I look at www-sj the 'widgeting' site drops way down to 30 but the 'widget' site is still at 22. My style isn't that different one site to the other so hopefully I can figure out what made the 'widgeting' site plunge.

Since the update isn't complete hopefully the 'widgeting' homepage will come back up but who knows. The 'widgeting page' has a PR6 while the 'widget' one has a PR5 but we've seen that doesn't seem to mean much anymore.

Even a site titled "submit a site to widgeting" is now ahead of both sites. (groan)

I did notice that sj is now showing up-to-date titles and cache now. That's probably already been mentioned here.

needinfo

4:15 pm on May 8, 2003 (gmt 0)

10+ Year Member



Is it just me or has anybody else noticed a lack of things happening over the past 24 hours or so.

a lack of obvious things happening I should say.

parabola

4:19 pm on May 8, 2003 (gmt 0)

10+ Year Member



Maybe they are having second thoughts about releasing the sj index before it is updated because normally (although nothing is normal here) it would be time to get the index out by now

sullen

4:21 pm on May 8, 2003 (gmt 0)

10+ Year Member



annj - I've noticed a very similar thing happening - the home page on one of my sites is now showing in the results where previously the spot was held by another page (v. glad about this as the page in question is a quasi-doorway page that I want to get rid of anyway)

As to the actual position - it's been leaping up and down.

The point is that I think Google has become more intelligent when it comes to related words.

Perhaps your site is now at 22 because of a surge in new widgetting sites? It's a very popular hobby...

Tropical Island

4:27 pm on May 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is probably not what everyone wants to hear however there was a thread awhile ago about the advantages of an AdWords program in headings that you ranked well in the regular listings. There were pros and cons however in the event of catastrophic changes that may (or may not) be in the offing we feel very relaxed that our main income generating site is covered with an AdWord in every important heading even though in some we have been #1 or #2. Well worth the cost to not feel threatened.

biggles

6:14 am on May 9, 2003 (gmt 0)

10+ Year Member



Anyone else notice that having KW in title tags seems to now be playing a much lesser role in top ranking Google SERPs?

Bio4ce

6:24 am on May 9, 2003 (gmt 0)

10+ Year Member



I have biggles. I've also noticed that the meta description seems to have higher weight. But just like anything with google, just when you see a pattern, a site pops up and throws your whole theory off.

There seems to be a bit of randomness to the serps. Maybe I should go study chaos theory or maybe even random-walk theory....Nah.

biggles

6:35 am on May 9, 2003 (gmt 0)

10+ Year Member



Bio4ce I hadn't noticed Meta Descriptions playing a bigger role and I hope that's not the case because they can be so easily abused.

Got to say that from my limited observations if the SERPs we're seeing on www2 etc are the new algo then Google's taken a big backward step in the quality of its search results. (As a Google fan I want to be wrong here)

biggles

6:46 am on May 9, 2003 (gmt 0)

10+ Year Member



Been meaning to ask, what's with this naming of Google updates - Boston; Cassandra; Dominic;...

Is this just a Webmaster World thing or an official Google naming convention (I see GG has refered to Dominic).

rfgdxm1

6:49 am on May 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>Is this just a Webmaster World thing or an official Google naming convention (I see GG has refered to Dominic).

It's a WebmasterWorld thing only. Since the updates of late haven't been a predictable one per month, this can make it easier to refer to a specific update mnany months later.

This 263 message thread spans 9 pages: 263