Is Google now just doing a continuous, rolling update?

Forum Moderators: open

Message Too Old, No Replies

Is Google now just doing a continuous, rolling update?

I believe so.

rfgdxm1

2:39 am on Jun 24, 2003 (gmt 0)

Given what is still going on now with the datacenters, I am going to propose the theory that Google has shifted from monthly updates to minor updates every day or 2. This is why Esmeralda seems to have no end. Esmeralda was the beginning of a brave new era of the constant update. Of course I might be wrong and just end up with egg all over my face after this post. ;)

I also notice it seems that Google has 2 banks of datacenters. Only one of the 2 does the partial update. Next partial update, the other bank of datacenters is used. Looks to me like -ex, -in, and -zu are involved this time. And, possibly -va. However, it could be that all of these are being rerouted all to one physical datacenter. Looking at traceroutes this may be the case.

mfishy

3:10 pm on Jun 29, 2003 (gmt 0)

I am seeing differences between the data centers again. Also, the allinanchor data is different on different data centers in some cases. This would suggest that it is real activity not a fresh/everflux hiccup.

Strangest thing about the SERPS is that pages are #1 on some datacenters and #100-500 on others, so you wind up getting traffic on and off for the day. CW even has some pages that were buried showing up out of nowhere.

northweb

4:24 pm on Jun 29, 2003 (gmt 0)

mfishy, my sites are acting the exact same way. Thought things were settled but today not.

Marval

4:39 pm on Jun 29, 2003 (gmt 0)

Noted this morning using one example site of mine that for some phrases (the ones I drop like a rock on) fresh tags are applied to my site index page, on the same site, the listings that are where they should be/have been for a long time dont have a fresh tag - all the same page - and noted that at the same time the directory listings have changed again

Ltribe

4:50 pm on Jun 29, 2003 (gmt 0)

Noticed the same thing ...

1. my index page disappeared for three days,
2. I could find it by looking for other key words, although it appeared indented under the contact page.
3. Each of those three days it had fresh tags.
4. this morning no fresh tag, and it's back where it belongs.

Also, Marval, I think you pointed out that the missing index page can sometimes be found with other keywords. I noticed that too.

My title is five words. It was top 10 for "word1 word2", or "word3 word4". While missing, I discovered I could find it with "word1 word3 word5", although again it was indented under a contact page or something.

I also discovered that using the dance tool with 25 iems was hiding a lot of the activity. I now use it with 10 and am catching changes.

Net_Wizard

5:57 pm on Jun 29, 2003 (gmt 0)

Here's what I have oberserved of Google weirdness...

1. Backlinks from guestbooks are not filtered

2. ODP weigh heavily on ranking even to a point that changed site theme or retired domains are still ranking on their old theme, some time at #1.

3. No clustering of domains in the result from time to time. Meaning, say the serp would show #3 domain.com/page1.html and further below say #12 domain.com/page2.html

4. Google superbot doesn't go deep on sites with PR less than 5

Feelf free to add your observation

amazed

6:01 pm on Jun 29, 2003 (gmt 0)

number four is not correct, or there are exceptions

g1smd

6:20 pm on Jun 29, 2003 (gmt 0)

Clustering only happens per page. If you add &num=100 to the end of the search URL (and especially if you also add &filter=0 to that), and search again, you'll get more clustering than you can shake a stick at.

Net_Wizard

6:48 pm on Jun 29, 2003 (gmt 0)

amazed,

Maybe you're right, maybe there are exceptions :)

g1smd,

Just to humor you, I've done exactly what you have said and the result is...there are now 5 URLs respresnting 1 domain spread though out the result. Tried it in -ex and -fi, the same thing.

Used to be, when there are more pages for a given domain for a specific query, at the most you only get 2 URL and a link for 'More results from this site'.

g1smd

7:32 pm on Jun 29, 2003 (gmt 0)

If I do that with my site,[&num=100&filter=0] that is, I see every page of the site listed, and all of the results clustered together starting at #1 then indented for #2 onwards.

The &filter=0 is cancelling out the "more results" option.

Buckley

7:46 pm on Jun 29, 2003 (gmt 0)

Some observations:

Thought the index had settled.....index page was back and doing well for important keywords for 2-3 days.

dissapered again 12 hours ago.

When my index page was showing about right for most important keyword searches in the last couple of days there were no fresh tags....although other sites unaffected during this unstable time were showing fresh tags.

At the same time....finding index page for lesser searches that remain the same and don't drop in and out every couple of days (as they do for most important searches) did show fresh tags? So the stable pages show fresh tags for the index page....but, at the same time, when the index page is back in it does not show fresh tags. Same index page, different search keywords. So for the searches that are stable the fresh tags are shown for index page and consistantly you know if there are no fresh tags....the index page for those searches will come and go.

This is the only consistant thing i have been able to see.

Thoughts?

Ltribe

8:19 pm on Jun 29, 2003 (gmt 0)

Is it the case that the dance is over, and what's happening now is simply the return of missing pages?

Or can people see actual index changes?

kstprod

8:36 pm on Jun 29, 2003 (gmt 0)

Ltribe..

Who knows really? I think the dance seems to be over, but with MAJOR unstability still for reasons unknown.

Pre update - index page #6

During update - index gone

6-26 - index reappeared at #21 (w/Ftag) but was stable all day

6-27 and 6-28 - index dropped further to #47 (w/Ftag) again stable both days at all DCs

Today - index completely gone again, replaced by my other pages ranking #70 and #71

I made NO changes during this time, so there is NO pattern I see. Last night I removed H1 tags to see if it would help. Shortly thereafter, Freshbot crawled deep, so we shall see tomorrow if it does any good.

g1smd

8:37 pm on Jun 29, 2003 (gmt 0)

The main change in data started 2003-06-15 with lots of new stuff being poured into -fi, spread to one extra datacentre per day each day, and ended with -in being the last to get the new data on 2003-06-22.

Since then, all of my sites have been solid; but a few others I have looked at have moved around a little, or been slightly different in the different datacentres.

Some people are reporting roller-coaster rides this last week, but i am just not seeing that here.

mfishy

8:54 pm on Jun 29, 2003 (gmt 0)

<<Some people are reporting roller-coaster rides this last week, but i am just not seeing that here. >>

HUGE rollercoaster ride. In a few datacenters then gone than back to #1 on many. The results in www are changing on the refresh for over 50 keyords I follow.

steveb

9:18 pm on Jun 29, 2003 (gmt 0)

Google appears to have a bad set of data that they simply can't get rid of no matter how hard they try. Every several days when it appears the datacenters are close to in sync and that process Google Guy mentioned might kick in, the data reverts back to this mess of incompetent results. It's like they try to shake it off, but like a bad cold it hangs around for days, making everything around it miserable.

It's been almost two months now. Longest cold in history.

mfishy

1:26 am on Jun 30, 2003 (gmt 0)

<<Google appears to have a bad set of data that they simply can't get rid of no matter how hard they try. Every several days when it appears the datacenters are close to in sync and that process Google Guy mentioned might kick in, the data reverts back to this mess of incompetent results. It's like they try to shake it off, but like a bad cold it hangs around for days, making everything around it miserable. >>

That is the best analogy I've heard so far.

They almost had it right, then boom - they revert back to some crazy ideay they had we call Dominic.

I strongly believe they are having SERIOUS issues with the data. No other reason for pages popping up to #1 on and off for days then dissappearing entirely and reverting to dominitis.

Wonder if Yahoo! and AOL appreciate showing their users new SERPS every 20 minutes? :)

g1smd

1:47 am on Jun 30, 2003 (gmt 0)

I'm betting everything will be a lot more clear by the end of July.

This is some major change at Google, involving 4 500 000 000 records and their interlinking, on 10 000 servers on each of 9 datacentres.

It obviously isn't some 5 minute job.

birdstuff

1:58 am on Jun 30, 2003 (gmt 0)

For about the 1st week after the latest update began my index page moved in and out of the top 10, one day at #1 for my 3 main keywords, the next day it dropped to somewhere deep in the abyss, and then back to #1 the next day. But since then it seems to have settled in at #1. But most of my internal pages are still changing on a daily (sometimes hourly) basis. Really confusing...

Anon27

2:25 am on Jun 30, 2003 (gmt 0)

I'm betting everything will be a lot more clear by the end of July.

But it started mid April: Are you for real:? "end of July?"

I was thinking end of tonight! You are thinking another 30 days?

PLEASE!

Ltribe

2:35 am on Jun 30, 2003 (gmt 0)

Earlier today I asked if changes in the SERPS were due to continued dancing, or if it might just be missing index pages coming back into the world.

Tonight I've just noticed significant ranking changes on older sites which have been very stable for more than six months.

I guess this means that new ingredients are indeed being added, and/or rankings are still being recomputed.

steveb

3:24 am on Jun 30, 2003 (gmt 0)

And now... for the first time in two months... I'm seeing what could possibly, maybe, potentially, perhaps hopefully... could be the application of some pagerank to the serps on www2/-fi.

Still missing/upside-down topical/index results in places, but several positive moves from sites that Google has a way to judge their value, at the expense of fresh piffle that has been tossed up in the past month.

Maybe I'm being too optimistic, but this is the first glimmer of positive development, although without the righting of the topical/index pages it is still scary.

littlecloud

4:27 am on Jun 30, 2003 (gmt 0)

I dont know if this is evidence of cont. updates, But I am back at # 3 for my main 1 word search term. I had given up on this site. I just decided checked it so I am not sure if it was there yesterday or friday. But that same site had been sitting at # 5 in Jan, Feb, I dont remember when it totally disappeared I think mid march or begining of march. It has been nowhere in the serps from that time until today.

Brett_Tabke

4:59 am on Jun 30, 2003 (gmt 0)

Support forum thread [webmasterworld.com]: April 12th, 2003:

- Private Forums
-- WebmasterWorld Supporters Forum [webmasterworld.com]
--- Liquid PR? Big changes afoot?
Brett_Tabke - 10:44 am on April 12, 2003 (cst -5)
The Google Update describes the reindexing behavior of Google. Google has two modes of refreshing and adding sites to its massive 3 billion page index.
1) Full Update.
Based on a full crawl of the the web to acquire all pages that it can. All pages are refreshed and the search results adjusted.
2) Continuous Daily Indexing
Google now has the ability to update the index based on continuous crawling. The crawling is done via a spider we have nicknamed FreshBot. The name comes from the fact that Google adds Fresh! next to pages that have been updated within the last 72 hours.
Full Update:
Googles full update has occurred approximately once a month for the last three years. We maintain a Google update history page.
When the full update occurs, results have historically floated back and forth between the new and the old results for a few days. This behavior seems to perplex many site owners and operators.
Search engine history buffs recognize the flip flopping results behavior. It is the process that drove seo's to the brink of insanity when Inktomi previously serviced Yahoo in 1999. With data centers in California and in Virginia, Inktomi results would appear to switch at random as queries were routed between the two centers with different indexes.
Google is compose of more than 50,000 PC computers setting in six active computer data centers around the planet. There are two additional centers (simple offices?) whose purpose is unknown and unused. Google is not specific about how many computers they have. They say they do not know themselves; Who can count that high?
The heart of the Google system is a heavily modified flavor of the Linux operating system running on standard architecture PC's. Each PC reportedly has a single 80 gig IDE drive.
There are two key components that drive the business end of the Google operating system. The first is a custom file system that supports large files. When we say large, we mean HUGE as in they span the entire 80 gig drive. The system then uses a proprietary random file access method across the entire index file. Computer buffs will recognize the technique as the same one employed by Commodore random access file types such as those supported by the classic Commodore 1541 floppy drive.
The second component of the Google system is a custom web server. Little is known about its origin. We have heard that it started out as an earlier version of Apache and was modified and optimized for pure speed.
There are currently in excess of three billion pages indexed by Google. If you take a web page, strip it of the html code, and then compress it with something such as Zip, the size of the file will fall dramatically. This web page you are viewing right now, could fit in under 1k of data. This is how Google is able to squeeze a three billion page index on to one single drive.
In addition to the large main search indexes setting on the PCs, there are other computers that serve cached pages and services such as Image Search, Google News, Froogle, and the Google edition of the Open Directory Project.
When a user visits Google and performs a search, the search will be processed by one PC. The query is most often routed to the data center at the nearest psychical location to the user. What results the user gets back for any query will depend on which index that particular pc has stored on its local drive.
When Google updates its index, that index must be distributed to all those computers. The file is probably (Google has never stated) transferred down to a single computer at the data center, which in turn distributes it to a cluster, and finally down into each pc. The amount of data that must be transferred and retranslated to update each PC is into terabytes - possibly petabytes - of data.
Remember that you don't know which data center you are going to connect with, or which index that PC will have on it. It could be the new index, or it could be the old index. This will make results appear to fluctuate to the user during updates.
Last year, Google introduced a new method of continuously updating its index. We first noticed it when they started to add FRESH! and the date it was last indexed next to each search listing. Somehow, Google had built a system that could almost continuously update itself with fresh listings.
As you are aware, at the heart of Googles technology is a rankings algorithm based upon web citations (backlinks) called PageRank (PR). The calculation of PR takes a great deal of time and has lead to the speculation that it is the reason that Google only refreshes the full index once a month.
Theoretically, Google could calculate the PR value for any page entirely on the fly. The system could download a page, extract the links on that page, update the database entries for all those target pages, and finally recalc the pr of the current page and move on to the next page. Thus, PR would move from being a static monthly variable to an ever changing Liquid variable that is continuously updated. We feel this is exactly what Google has been moving towards with its FreshBot indexing.
The phenom is simple: Fresh pages rank higher than other pages. Although there appears to be a Fresh listing cheat factor that is placed on every listing, we don't think that is entirely what is occurring. Based on subtle clues, including the behavior of the resent index, we feel Google has moved towards Liquid PageRank that is continuously updated. It may mark the complete end of the monthly updates.

jjdesigns4u

7:40 pm on Jun 30, 2003 (gmt 0)

So when is my main man GG going tocome out and proclaim the continuos updates are on!

skipfactor

8:06 pm on Jun 30, 2003 (gmt 0)

GG pointed to that there thread right above you that Brett so kindly posted from the Supporters Forum--hey no fair, I want a refund: :)

Brett also did an interesting post in rfgdxm1's thread too.

[webmasterworld.com...]

Looks like an endorsement from here. :)

aurora

8:12 pm on Jun 30, 2003 (gmt 0)

UK searches - Do the results from the 9 data centres take longer to reach the UK.
After checking every 5 mins on all nine data centres the results only seemed to reach UK this Sunday.
Are they normally behind or should they be as is.?
If they are showing on UK does that mean they are probably going to stay there?
Thanks

steveb

8:23 pm on Jun 30, 2003 (gmt 0)

The implication from the beginning when Google Guy said there would be "at least one more" of the old style updates was clearly that a new method of updating was in store. But also just as clearly that process has not yet begun since the Esmeralda update has obviously not finished. It will only be finished when the Google Directory and the pagerank display for sites are updated (or those two things are eliminated, which is unlikely to an infinite degree).

Fresh, everflux and superflux very likely are the path of the future, but it would be very wrong to draw conclusions about that based on the pattern of what has been occuring. What has been occuring was stated to be more in line with the past than with the future.

This is a very important consideration for webmasters sitting around considering how to make their sites more Google-friendly. You have to think about he future, not what is past, even the recent past.

rfgdxm1

8:32 pm on Jun 30, 2003 (gmt 0)

>Looks like an endorsement from here.

I tend to agree. GG likely can't come out and directly spill the beans about what is going on inside Google. However, he certainly isn't casting any doubt on this speculation about a continuous, rolling update; and in fact seems to be hinting that this guess is right. Plus, I can see no other explanation for what is going on at Google other than this. I am seeing all kinds of periodic shifts in SERPs that can't be explained by traditional everflux and freshbot action.

jjdesigns4u

8:38 pm on Jun 30, 2003 (gmt 0)

So how continuous..

Should I be checking my placement everyday?

rfgdxm1

8:45 pm on Jun 30, 2003 (gmt 0)

jjdesigns4u, I'd say check every few days at least. Current pattern is a new mini-update every few days.

This 216 message thread spans 8 pages: 216