|Google June 2003 : Update Esmeralda Part 3|
Continued from: [webmasterworld.com...]
Has anyone here ever heard of a Kalman filter? It's a mathematical way of building a model of the world. The math is pretty complex, but basically you try to build a model of the thing you're trying to represent. When you get a new data point, you update your model's estimate about the state of things.
Why am I talking about this? Well, Kalman filters have a knob that blends between how much you believe your model vs. how much you believe each new data point. If you tweak the knob all the way in one direction, you always trust the model and any new input just gets ignored. On the other extreme, you can ignore your current estimates about the state of the world, and only trust each new data point as it comes in. If you set the knob too far in that direction, the object you're trying to model jumps all over the place each time you see even a hint of new info.
Lots of people here are getting more stressed than they need to be--their knobs are turned a little too far toward worrying about the very last thing that happened: "Now my subpage is coming up higher than it should! Okay, now my index page is back and the SERPs look good. Gaaack! Now I'm showing well at DC but the subpage still shows up higher at FI! Too much pressure--I'm going to drink now, and start spamming every FFA I see tomorrow!" :)
If you look around, you'll notice not too many senior members posting here. They chime in every so often, but their knobs are twisted further in the other direction. They know that the index switchover takes a little time to settle, and they have the perspective not to get too worried about things right now, and in general.
I haven't posted much of my take lately, but if I could give advice, it would probably be: don't panic. Here's what I would expect. Probably about one data center per day will get switched to the Esmeralda index. You may see some improvements during the course of the switchover as ingredients get blended in as they're ready. I would expect another round of ingredient-adding after the index is switched over.
So: if you're really into Google-watching as a sport, I would check in once a day to see what data centers have been switched, and maybe to run 2-3 searches. Browse a little while, and then come back the next day. Find something fun to do at night besides poring over every last thing that GoogleGuy (or whoever) posts on WebmasterWorld. You'll feel better, I promise.
This is just my take. You're welcome to ignore it. But I mention it because during this index, I heard about a lot of good and bad searches from webmasters, and the more I dig, the more confident I am that things will turn out well.
"In the same way that its people who have based their whole business on their ability to come near the top in Google are suddenly saying things are a "disaster". People who have looked at the web as more than Google have not regarded all the changes as a disaster."
If I have the best marketing strategy in the world with tons of traffic and sales from tons of places, and also get a huge amount of my traffic and sales from google, and google dops me, that can be devastating...I'm used to those sales from google and live according to the income I generate from ALL.
There have been so many posts on Esmeralda that it's hard to keep up with the general sentiment, let alone specific items. But...
Has anyone thanked GoogleGuy for giving up an entire Sunday, unpaid, to nurse this forum through a bout of insecurity?
If not, then I'd like to.
Before Esmeralda, I read all the posts, did all the optimisations and good practice that GoogleGuy hinted at, and I'm nothing less than delighted with what's rippling down now.
On the grounds that everytime someone goes up in the SERPS, someone goes down (and *more* than one goes down if someone goes up several places), I think GG has provided a magnificent service these past few days, and his advice and cryptic commentary have been really most useful.
I worry when a lot of the traffic in this forum is devoted to new "theories" about Google and so little is devoted to listening to the hints that are there for the taking.
Well, Googleguy - thanks. For your time and for your hints. They've been invaluable and they've worked.
Pour yourself a beer, or better, pour yourself an English beer, and know that your words don't fall on deaf ears.
The (over)reliance on free Google traffic have caused a lot of sleepless nights for webmasters. Diversification and Adwords are the only good solutions I have seen discussed.
[edited by: div01 at 10:05 pm (utc) on June 18, 2003]
New Guy needs help with question posted early. Would appreciate a kind sole assistance, or maybe point me in the right direct for interesting reading on this subject. Thanks.
Can someone please explain to me once these 9 datacenters are finished updating, how long does it take to get into the main Google index? Its wonderful that this update is happening but until I get into the main index I am not seeing any real traffic difference. Would someone be so kind to forward this info to me. thanks!
Of course in April things got wacky.. But used to be everything worked on ABOUT a 30 day cycle. In that cycle you had a dance and a deep crawl. At random times (almost daily or bi-daily) you would get a freshbot to a few of your pages. Hope this answers your questions.
|Has anyone thanked GoogleGuy for giving up an entire Sunday, unpaid, to nurse this forum through a bout of insecurity? |
I suggested he get a trophy or medal or something, but apparently it's a no-go :-) It might a conflict of interest issue.
This is how I think it works. I guess if nobody corrects me you can assume I'm right.
The data is stored at the data centres and when somebody searches on www the results come from one of the centres. So the more data centres that have the new index the greater the chance of seeing the new index in your results.
I'm not sure whether there is an equal chance of getting data from any given data centre or whether there is some other criteria such as location which comes into the equation, but if all data centres contain the new index then you will definately see the new results in a search.
Sometimes during an update refreshing the results will show new and old results intermittantly.
GG said earlier that he expects the data to be added to a new data centre at the rate of 1 per day. Somebody else said that 3 data centres currently have the new index.
Hope that helps.
<<I worry when a lot of the traffic in this forum is devoted to new "theories" about Google and so little is devoted to listening to the hints that are there for the taking. >>
Please provide any hints at all that GoogleGuy has given in this thread that pertain to getting a better ranking.
This is why theories are necessary. To gain insight into how to get our pages to rank better. This, unfortunately, is something that Googleguy cannot provide.
Thanks for that explanation but I"m still a bit confused. I'm on the east coast of US - Vermont and I consistently saw the -fi results on www this morning for about 2 hours. It didn't seem that if I refreshed it that it would show the older results. Now I see the older index again on www and I hit refresh about 20 times and it was the same...It seems to me that when it's the new index it stays that way for a certain period of time... or is it always a random hit- ie you will get 1 out of the 9 and currently 6 are old and 3 are new? THanks.
|More Traffic Please|
"I'm not sure whether there is an equal chance of getting data from any given data centre or whether there is some other criteria such as location which comes into the equation, but if all data centres contain the new index then you will definately see the new results in a search."
Geographical proximity does play a role as to what data center you will be routed to as well as the amount of load that center is currently under. You may not always get routed to the same center.
What you said does seem to be the case. As I say I'm not quite sure how the routing works and haven't seen any theories that would explain this. Cookies perhaps?
Ultimately it doesn't really matter. The longer the dance goes on for the more often you see the new index.
I wonder if something is automated because I was up at 5 am and saw the new index consistently for a few hours . Are we pretty sure that this new index will migrate to all 9 and it's not a "test"? I like it and hope it will be on www... Or are there theories that the new update will be a randomization of all 9 centers?
Just as the update was starting www reverted back to mid-April data (title, description and cache) for one site that had been showing mid-May data for many weeks. The site had been #1 for all that time, but dropped to #60 on www at the start of the update. A lot of other sites were fresh tagged at that time, but not the site that dropped.
Looking at -fi it was already back at #1, and that result has now spread to 4 other datacentres. The others are showing really old data from April, not the stuff they were showing up until a few days ago.
Additionally, a site that has not appeared in the index since first publishing on 2003-05-04, showed up as #1 at the same time (under different search terms to the other site above) on -fi, and that result, too, has now spread to 4 other servers.
I still don't understand why the index has reverted to using mid-April data as the update began and still is on the servers not yet updated with the June information.
<edit>Correction of factual mistake</edit>
[edited by: g1smd at 12:54 am (utc) on June 19, 2003]
No the index has not spread to all dc's yet. Also, -fi does not match the other dc's that have the new index...tho I'm not sure if the other -dc's are more reflective of the final result, or -fi is still the lead horse..
|Lots of people here are getting more stressed than they need to be--their knobs are turned a little too far toward worrying about the very last thing that happened: "Now my subpage is coming up higher than it should! |
Admittedly - Dominic update was a real treat, watching the the eb and flows of the update.
Admittedly - I didn't get anything done either.
Truthfully - It has been extremely difficult to "break the habit" of watching the new "Google BLOCKBUSTER RELEASE" but the length of the flic is just way too long to be productive, and if I'm not productive, I will lose far more than just a few SERP positions.
Thanks GG > I pop in here occasionally to capture a glimpse of your recent posts. You always seem to reiterate my suspicious, which means the "pre-announcements" to edgy clients isn't that far off the mark.
Music to my ears, DerekH--many thanks. :) Even more music to my ears is seeing folks chip in answers to JoeHouse. That's the essence of WebmasterWorld in my opinion--people trying to help each other out--and I'm glad to see that essence emerging from under the conspiracy theory du jour posts. It's a major part of what makes WebmasterWorld a really nice place to hang out and chat with people. :)
Some observations :-)
The haves and have nots (as regards to rankings in the new updates), seem to be longer serving forum members and newer members repectively. Maybe that is web mastering experience, maybe that it is to do with the age of the sites (I really believe new sites have be caned somewhat recently).
Second. Watching the results come and go, sites come in high and then get toasted, leads me to believe you can almost watch the aspects or assets of the new algos kicking in as they happen. Despite the fraught nature of watching the dance / flux, if you seize the opportunity, you can say (in some cases) "I am sure this site got ranked high because...., and then got toasted because....".
I am in the fortunate position of having done all the major (real user) improvements / enhancements to my sites I am going to be doing for a long while. I have made them real content sites and am extremely proud of them. But now, I have some spare time and will try my hand at applying what I believe I have learned to some "No worse, but then maybe no better then other sites", and see if I am right.
Anyway, just wanted to say do not worry new site owners about the results right now, and any desperation you feel with the more established site owners saying "What problem, I see no problem". And seize the chance to maybe get much more of an insider's view on what Google's algos like, and what they do not.
Worried about the Google directory?
markus007 wrote "Anyone else noticing the Google directory acting up? One minute i have the category icon active on the toolbar and the next its gone. I am no longer showing in the google directory though."
I know from a reliable source that ODP (Google directory obtains data from ODP) started a major upgrade on April 24th. Part of the upgrade has to do with replacement of old hardware with new Linux and Solaris boxes.
Also, I know that on 6/20/2003 the main ODP machine will be upgraded.
You shouldn't be concerned because ALL hardware has to be maintained once in a while. When was the last time that you defragmented your PC? If you haven't, it will be a good idea to do it now. Your PC will be very happy with you!
I guess that as soon as the major ODP upgrade is completed by next week, things will be more stable in the Google directory.
[edited by: zafile at 2:26 am (utc) on June 19, 2003]
GrinninGordon, I agree with your analysis: after having lurked these boards since Dominic started, and joined up after Esmeralda, the more senior members are not complaining as much, probably because they have sites that were all well planted in Google, whereas the newer sites, represented by the newer members, are having trouble, especially if old indices are being used.
I run some sites that have been in Google for years and others that are new as of a month or two ago. I have seen the newer sites take a beating lately, while the veterans seem to be weathering this Perfect Storm. Maybe it's kind of like hazing, the new guys have to prove themselves before they earn Google's respect. Hopefully DeepFresh will fix things.
DerekH is right, along with all the other folk who mention it: GoogleGuy deserves our thanks. He has done an impeccable job answering the questions as well as he can from the inside, while still keeping the party line in pretty sharp focus (believe me--I used to work at Infoseek in the heyday--ahh...the good old days, but I digress). And thanks to Brett for this community--I know you guys probably don't hear it enough. Guess I should donate some $$ and up my status around here.
(Although I wonder if Google threw off their results just to get more subscriptions to WebmasterWorld....) ;)
Some of us here might not be christians with bibles close to hand...
The Directory works on a entirely different time-frame than the update does. The PR values in the directory right now mean nothing. Don't even bother checking the directory for a while until everything is settled.
I would like to Thank everyone (especially GoogleGuy)for helping the new guy on the block. Very much appreciated :)
To think just a few months ago I knew very little about google and how intense this can all be. Now I feel like an old pro.
Getting back to the datacenters. As I understand it, when Google is dancing the 9 datacenters are gathering new information to be index. Once Google's datacenters are completely updated and settled, approx 4 to 5 days later they will permanently be placed on google's main index www.google.com until the next update comes within a 28 day cycle.
Now what I don't get is, if this theory is correct then that would mean that the only purpose for the datacenters are not for geographical searches but rather to help gather information quickly to be placed on the Google main index.
That would explain why my traffic is still very light because my website is new and the main index does not have all the current results.
Am I close with this theory?
zafile, I'm not really worried, the thing is my site droped from the google directory, but its still in the Dmoz.org the same thing is happening to webmasterworld. The only reason i commented on it was because it was odd existing sites where dropping, normally its new sites getting added. Whatever is going on, you can be sure its part of the big update.
Joe when someone goes to google and searches, one of the 9 datacenters is randomly picked to handel the query. The new index is now being spread over all 9 datacenters. Eventually all datacenters have the new index. Then when you do a search on google you will only see results from the new index.
|Now what I don't get is, if this theory is correct then that would mean that the only purpose for the datacenters are not for geographical searches but rather to help gather information quickly to be placed on the Google main index. |
|Am I close with this theory? |
I believe not.
Google uses their datacenters like many other large high traffic sites use many datacenters. Each request is dynamically sent to the closest datacenter which is best equipped to handle the request. Most of the times you may get the center closest to you based on geotargeting in dns. If the closest datacenter is too busy, then you may be redirected to another center with less load on it.
Many other sites, such as cnn.com, do this as well. It is a common way to increase performance of a site.
I believe google crawls (the data gathering part) all come out of one two IP subnets, meaning they are (probably) in the same datacenter. The deep crawl use to be from one, and the fresh crawl used to be from another.
Updates like this always start on one datacenter. Google then copies the new index to the other datacenters, one at a time, over the next few days until they all have the new index. For the most part, the index all originates in one location.
Exception: At the moment they have two datacenters which seem not to be used in production, and have different results from the rest.
Thanks for your input. If that's the case then why do I keep hearing and reading that Google deepbot comes around a few days after the dance has settled with IP address starting with 216 picks up your new info and thats when your new information gets indexed. Why a few days after the dance, and why another crawl to index it to the main index?
Which theory is correct?
Could you address the issue of old sites vs. new sites (and by new sites, I don't mean brand-spanking new) but sites that got put online between Cassandra and Dominic, and therefore have never really been added to the permanent index. Is the theory correct, that because Dominic reverted to old data, the one and only deepbot that newer sites got was trashed, and therefore we are caught up in a kind of never-ending freshbot tornado? Touching down, lifting up, touching down, etc... If true, then I imagine we now have to wait another month before the new freshdeepbot adds us in the next dance? I am not complaining - just verifying so that I can quit waiting for this dance, and just get used to the idea that my site has missed out on two dances, and now has to wait for a third. I'll live with that if true; just wanting to know so I can put this baby to rest. Thanks GG!
|Getting back to the datacenters. As I understand it, when Google is dancing the 9 datacenters are gathering new information to be index. Once Google's datacenters are completely updated and settled, approx 4 to 5 days later they will permanently be placed on google's main index www.google.com until the next update comes within a 28 day cycle. |
Not quite what I uderstand, which is as follows. Data is collected before the dance starts by a process of crawling by bots. You can pretty much regard the crawl as being an ongoing process now.
The dance is when the new data is transfered to first one data center, and then migrates to another, and another, until all the data centers show the new index. Effectively, there is no "main index" separate from these data centers -- the results you get on a given search come from one of the data centers, so during a dance you get different results depending on which data center your query is routed to. So the results seem to dance up and down, and in and out...
Even once the 9 data centers are updated with the new index there can be changes for various reasons -- like ongoing PR calculations, spam filters being applied or tweaked, etc.
There are some simplifications here, but I think it's more or less correct.
Also, on 6/20 DMOZ will test the conversion of ALL of its data to UTF-8. This will help the data communication between DMOZ to Google.
What I told you is (for the most part) correct. There are no "theorys" to figure out in your questions, as it is a fairly well figured out process.
Why it (historically) waits a few days before it starts the crawl again? I don't know, don't really care, and frankly it doesn't really matter. It does behave this way, as that is well documented in previous discussions and updates.
This is not a part of the update that really needs to be figured out from scratch. Updates like this one behaves (for the most part) have happened before. Look into previous Google Update threads for more information.
Now if you can tell us how Google ranks tags in pages, and factors links into results and pageranks? Especially how things have changed. Everyone has theories on that. That, "grasshopper", is the puzzle...not the update process itself.
<<If that's the case then why do I keep hearing and reading that Google deepbot comes around a few days after the dance has settled with IP address starting with 216 picks up your new info and thats when your new information gets indexed. Why a few days after the dance, and why another crawl to index it to the main index? >>
Traditionally, the deepbot crawls pages soon after the update. The IP range was 216...The data from this crawl was used as the basis for the new index/next update.
The process has changed as the update you are now seeing is generated from data gathered by what was the freshbot, but is now the freshdeep (as described by GoogleGuy)- the ip range is 64....
No one here knows, except GoogleGuy, what to expect now. We have heard hints that they are moving towards a more continual update process. They have been moving in this direction for a while with the addition of freshbot.
The best advice I can give is to ALWAYS add new content and pages/sites and acquire links. This way, it does not really matter to you how or when your pages are crawled.
I recently rose back to #1 for some KW phrases after disappearing for one month (due I think to www.mydomain.com and mydomain.com issue, with mydomain.com showing up in SERPS).
Odd observation: For KW phrases where I was also rock steady at #1 before Dominic but have not recouped, I see the same mydomain.com showing up instead. So, here's what perplexes me: Why am I seen as www.mydomain.com for some KW phrase searches, but seen as mydomain.com for other KW phrases. Seems if I'm indexed correctly, that should be true for any and all KW phrases. In other words, www.mydomain.com should come up for all searches where my homepage comes up at all instead of mydomain.com coming up.
It's almost like they tweak per category instead of the entire index. That may not make sense, but it looks like that to me. But I don't have a handle on anything...I'm just trying to make sense out of something I am sure makes sense, but doesn't look like it does.
Any thoughts that may help us understand google's beahvior in this respect?