Forum Moderators: open
Why am I talking about this? Well, Kalman filters have a knob that blends between how much you believe your model vs. how much you believe each new data point. If you tweak the knob all the way in one direction, you always trust the model and any new input just gets ignored. On the other extreme, you can ignore your current estimates about the state of the world, and only trust each new data point as it comes in. If you set the knob too far in that direction, the object you're trying to model jumps all over the place each time you see even a hint of new info.
Lots of people here are getting more stressed than they need to be--their knobs are turned a little too far toward worrying about the very last thing that happened: "Now my subpage is coming up higher than it should! Okay, now my index page is back and the SERPs look good. Gaaack! Now I'm showing well at DC but the subpage still shows up higher at FI! Too much pressure--I'm going to drink now, and start spamming every FFA I see tomorrow!" :)
If you look around, you'll notice not too many senior members posting here. They chime in every so often, but their knobs are twisted further in the other direction. They know that the index switchover takes a little time to settle, and they have the perspective not to get too worried about things right now, and in general.
I haven't posted much of my take lately, but if I could give advice, it would probably be: don't panic. Here's what I would expect. Probably about one data center per day will get switched to the Esmeralda index. You may see some improvements during the course of the switchover as ingredients get blended in as they're ready. I would expect another round of ingredient-adding after the index is switched over.
So: if you're really into Google-watching as a sport, I would check in once a day to see what data centers have been switched, and maybe to run 2-3 searches. Browse a little while, and then come back the next day. Find something fun to do at night besides poring over every last thing that GoogleGuy (or whoever) posts on WebmasterWorld. You'll feel better, I promise.
This is just my take. You're welcome to ignore it. But I mention it because during this index, I heard about a lot of good and bad searches from webmasters, and the more I dig, the more confident I am that things will turn out well.
If I have the best marketing strategy in the world with tons of traffic and sales from tons of places, and also get a huge amount of my traffic and sales from google, and google dops me, that can be devastating...I'm used to those sales from google and live according to the income I generate from ALL.
Before Esmeralda, I read all the posts, did all the optimisations and good practice that GoogleGuy hinted at, and I'm nothing less than delighted with what's rippling down now.
On the grounds that everytime someone goes up in the SERPS, someone goes down (and *more* than one goes down if someone goes up several places), I think GG has provided a magnificent service these past few days, and his advice and cryptic commentary have been really most useful.
I worry when a lot of the traffic in this forum is devoted to new "theories" about Google and so little is devoted to listening to the hints that are there for the taking.
Well, Googleguy - thanks. For your time and for your hints. They've been invaluable and they've worked.
Pour yourself a beer, or better, pour yourself an English beer, and know that your words don't fall on deaf ears.
DerekH
New Guy needs help with question posted early. Would appreciate a kind sole assistance, or maybe point me in the right direct for interesting reading on this subject. Thanks.
Can someone please explain to me once these 9 datacenters are finished updating, how long does it take to get into the main Google index? Its wonderful that this update is happening but until I get into the main index I am not seeing any real traffic difference. Would someone be so kind to forward this info to me. thanks!
This is how I think it works. I guess if nobody corrects me you can assume I'm right.
The data is stored at the data centres and when somebody searches on www the results come from one of the centres. So the more data centres that have the new index the greater the chance of seeing the new index in your results.
I'm not sure whether there is an equal chance of getting data from any given data centre or whether there is some other criteria such as location which comes into the equation, but if all data centres contain the new index then you will definately see the new results in a search.
Sometimes during an update refreshing the results will show new and old results intermittantly.
GG said earlier that he expects the data to be added to a new data centre at the rate of 1 per day. Somebody else said that 3 data centres currently have the new index.
Hope that helps.
Please provide any hints at all that GoogleGuy has given in this thread that pertain to getting a better ranking.
This is why theories are necessary. To gain insight into how to get our pages to rank better. This, unfortunately, is something that Googleguy cannot provide.
Thanks for that explanation but I"m still a bit confused. I'm on the east coast of US - Vermont and I consistently saw the -fi results on www this morning for about 2 hours. It didn't seem that if I refreshed it that it would show the older results. Now I see the older index again on www and I hit refresh about 20 times and it was the same...It seems to me that when it's the new index it stays that way for a certain period of time... or is it always a random hit- ie you will get 1 out of the 9 and currently 6 are old and 3 are new? THanks.
Geographical proximity does play a role as to what data center you will be routed to as well as the amount of load that center is currently under. You may not always get routed to the same center.
I wonder if something is automated because I was up at 5 am and saw the new index consistently for a few hours . Are we pretty sure that this new index will migrate to all 9 and it's not a "test"? I like it and hope it will be on www... Or are there theories that the new update will be a randomization of all 9 centers?
Looking at -fi it was already back at #1, and that result has now spread to 4 other datacentres. The others are showing really old data from April, not the stuff they were showing up until a few days ago.
Additionally, a site that has not appeared in the index since first publishing on 2003-05-04, showed up as #1 at the same time (under different search terms to the other site above) on -fi, and that result, too, has now spread to 4 other servers.
I still don't understand why the index has reverted to using mid-April data as the update began and still is on the servers not yet updated with the June information.
<edit>Correction of factual mistake</edit>
[edited by: g1smd at 12:54 am (utc) on June 19, 2003]
Lots of people here are getting more stressed than they need to be--their knobs are turned a little too far toward worrying about the very last thing that happened: "Now my subpage is coming up higher than it should!
Admittedly - Dominic update was a real treat, watching the the eb and flows of the update.
Admittedly - I didn't get anything done either.
Truthfully - It has been extremely difficult to "break the habit" of watching the new "Google BLOCKBUSTER RELEASE" but the length of the flic is just way too long to be productive, and if I'm not productive, I will lose far more than just a few SERP positions.
Thanks GG > I pop in here occasionally to capture a glimpse of your recent posts. You always seem to reiterate my suspicious, which means the "pre-announcements" to edgy clients isn't that far off the mark.
The haves and have nots (as regards to rankings in the new updates), seem to be longer serving forum members and newer members repectively. Maybe that is web mastering experience, maybe that it is to do with the age of the sites (I really believe new sites have be caned somewhat recently).
Second. Watching the results come and go, sites come in high and then get toasted, leads me to believe you can almost watch the aspects or assets of the new algos kicking in as they happen. Despite the fraught nature of watching the dance / flux, if you seize the opportunity, you can say (in some cases) "I am sure this site got ranked high because...., and then got toasted because....".
I am in the fortunate position of having done all the major (real user) improvements / enhancements to my sites I am going to be doing for a long while. I have made them real content sites and am extremely proud of them. But now, I have some spare time and will try my hand at applying what I believe I have learned to some "No worse, but then maybe no better then other sites", and see if I am right.
Anyway, just wanted to say do not worry new site owners about the results right now, and any desperation you feel with the more established site owners saying "What problem, I see no problem". And seize the chance to maybe get much more of an insider's view on what Google's algos like, and what they do not.
markus007 wrote "Anyone else noticing the Google directory acting up? One minute i have the category icon active on the toolbar and the next its gone. I am no longer showing in the google directory though."
I know from a reliable source that ODP (Google directory obtains data from ODP) started a major upgrade on April 24th. Part of the upgrade has to do with replacement of old hardware with new Linux and Solaris boxes.
Also, I know that on 6/20/2003 the main ODP machine will be upgraded.
You shouldn't be concerned because ALL hardware has to be maintained once in a while. When was the last time that you defragmented your PC? If you haven't, it will be a good idea to do it now. Your PC will be very happy with you!
I guess that as soon as the major ODP upgrade is completed by next week, things will be more stable in the Google directory.
Cheers!
[edited by: zafile at 2:26 am (utc) on June 19, 2003]
I run some sites that have been in Google for years and others that are new as of a month or two ago. I have seen the newer sites take a beating lately, while the veterans seem to be weathering this Perfect Storm. Maybe it's kind of like hazing, the new guys have to prove themselves before they earn Google's respect. Hopefully DeepFresh will fix things.
DerekH is right, along with all the other folk who mention it: GoogleGuy deserves our thanks. He has done an impeccable job answering the questions as well as he can from the inside, while still keeping the party line in pretty sharp focus (believe me--I used to work at Infoseek in the heyday--ahh...the good old days, but I digress). And thanks to Brett for this community--I know you guys probably don't hear it enough. Guess I should donate some $$ and up my status around here.
(Although I wonder if Google threw off their results just to get more subscriptions to WebmasterWorld....) ;)
To think just a few months ago I knew very little about google and how intense this can all be. Now I feel like an old pro.
Getting back to the datacenters. As I understand it, when Google is dancing the 9 datacenters are gathering new information to be index. Once Google's datacenters are completely updated and settled, approx 4 to 5 days later they will permanently be placed on google's main index www.google.com until the next update comes within a 28 day cycle.
Now what I don't get is, if this theory is correct then that would mean that the only purpose for the datacenters are not for geographical searches but rather to help gather information quickly to be placed on the Google main index.
That would explain why my traffic is still very light because my website is new and the main index does not have all the current results.
Am I close with this theory?
Joe when someone goes to google and searches, one of the 9 datacenters is randomly picked to handel the query. The new index is now being spread over all 9 datacenters. Eventually all datacenters have the new index. Then when you do a search on google you will only see results from the new index.
Now what I don't get is, if this theory is correct then that would mean that the only purpose for the datacenters are not for geographical searches but rather to help gather information quickly to be placed on the Google main index.
Am I close with this theory?
Google uses their datacenters like many other large high traffic sites use many datacenters. Each request is dynamically sent to the closest datacenter which is best equipped to handle the request. Most of the times you may get the center closest to you based on geotargeting in dns. If the closest datacenter is too busy, then you may be redirected to another center with less load on it.
Many other sites, such as cnn.com, do this as well. It is a common way to increase performance of a site.
I believe google crawls (the data gathering part) all come out of one two IP subnets, meaning they are (probably) in the same datacenter. The deep crawl use to be from one, and the fresh crawl used to be from another.
Updates like this always start on one datacenter. Google then copies the new index to the other datacenters, one at a time, over the next few days until they all have the new index. For the most part, the index all originates in one location.
Exception: At the moment they have two datacenters which seem not to be used in production, and have different results from the rest.
-Pete
Thanks for your input. If that's the case then why do I keep hearing and reading that Google deepbot comes around a few days after the dance has settled with IP address starting with 216 picks up your new info and thats when your new information gets indexed. Why a few days after the dance, and why another crawl to index it to the main index?
Which theory is correct?
Could you address the issue of old sites vs. new sites (and by new sites, I don't mean brand-spanking new) but sites that got put online between Cassandra and Dominic, and therefore have never really been added to the permanent index. Is the theory correct, that because Dominic reverted to old data, the one and only deepbot that newer sites got was trashed, and therefore we are caught up in a kind of never-ending freshbot tornado? Touching down, lifting up, touching down, etc... If true, then I imagine we now have to wait another month before the new freshdeepbot adds us in the next dance? I am not complaining - just verifying so that I can quit waiting for this dance, and just get used to the idea that my site has missed out on two dances, and now has to wait for a third. I'll live with that if true; just wanting to know so I can put this baby to rest. Thanks GG!
Getting back to the datacenters. As I understand it, when Google is dancing the 9 datacenters are gathering new information to be index. Once Google's datacenters are completely updated and settled, approx 4 to 5 days later they will permanently be placed on google's main index www.google.com until the next update comes within a 28 day cycle.
Hi JoeHouse.
Not quite what I uderstand, which is as follows. Data is collected before the dance starts by a process of crawling by bots. You can pretty much regard the crawl as being an ongoing process now.
The dance is when the new data is transfered to first one data center, and then migrates to another, and another, until all the data centers show the new index. Effectively, there is no "main index" separate from these data centers -- the results you get on a given search come from one of the data centers, so during a dance you get different results depending on which data center your query is routed to. So the results seem to dance up and down, and in and out...
Even once the 9 data centers are updated with the new index there can be changes for various reasons -- like ongoing PR calculations, spam filters being applied or tweaked, etc.
There are some simplifications here, but I think it's more or less correct.
Which theory is correct?
Why it (historically) waits a few days before it starts the crawl again? I don't know, don't really care, and frankly it doesn't really matter. It does behave this way, as that is well documented in previous discussions and updates.
This is not a part of the update that really needs to be figured out from scratch. Updates like this one behaves (for the most part) have happened before. Look into previous Google Update threads for more information.
Now if you can tell us how Google ranks tags in pages, and factors links into results and pageranks? Especially how things have changed. Everyone has theories on that. That, "grasshopper", is the puzzle...not the update process itself.
-Pete
Traditionally, the deepbot crawls pages soon after the update. The IP range was 216...The data from this crawl was used as the basis for the new index/next update.
The process has changed as the update you are now seeing is generated from data gathered by what was the freshbot, but is now the freshdeep (as described by GoogleGuy)- the ip range is 64....
No one here knows, except GoogleGuy, what to expect now. We have heard hints that they are moving towards a more continual update process. They have been moving in this direction for a while with the addition of freshbot.
The best advice I can give is to ALWAYS add new content and pages/sites and acquire links. This way, it does not really matter to you how or when your pages are crawled.