| This 75 message thread spans 3 pages: 75 (  2 3 ) > > || |
|Is The "Sandbox" Ending?|
Is this Doomsday, or is Something Wonderful about to happen?
The sandbox is coming to an end soon. It will be rolled out gradually, perhaps as fast as google used to be able to incorporate new documents before they started having problems late last winter. Even though they could probably incorporate all the documents above 2^32 (2 to the 32nd power or 4.2 Billion) level of documents established last winder at once and begin doing their cycles of algorithmic calculations, I think they are just going to introduce documents into the expanded matrix at a gradual rate. That is if they introduced all the documents into the new, exapanded matrix, they could do the full set of various algorithmic calcuations and we would see tremendous upheaval in the SERPs, on a much larger scale than ever seen before, because their new matrix is simply capable of that, and to demonstrate such power would be an admission that there was a capacity problem. Rather I think they will roll all the documents sandboxed into the expanded index over the next 6 to 8 weeks and that the process has already started. It will look pretty much like the old rolling updates google used to, except it will be strong and nearly continuous, punctuated by periods of stabality though the majority of the matrix at any one time.
To admit there was a capacity problem after all this time might be taken by some to be an admission of culpable negligence in their failure to advise potential investors regarding serious technical issues during their IPO period. I don't think there will be any culpable negligence issues because Google will not fail. However, if it did fail, I think the fact that they have withheld such information would make them subject to suit, perhaps even criminally if some of the ones who profited on the IPO were the ones who concealed the capacity problem. They would only be guilty of negligent deception IF THEY FAILED.
It's kind of like you wake up in the middle of the night and there is someone in your bedroom and they make a frightenting sound and it's dark and you see something flash towards you and you are so scared you shoot into the darkness to later find it's a serial killer wanted in a nationwide manhunt and you are a freaking hero and on talk shows everywhere, or for a change of scene, it's the neighbor's senile grandfather and you are doing 10 to 20 in max lockdown with Bubba Joe who likes to scratch his ass and sniff his fingers when he's not telling you how pretty your eyes are.
I don't have anything that would serve as proof of what I'm saying, but it pretty much stands to reason that if Google has been perfectly mum about the sandbox to this point, that they are not going to so quickly incorporate new and faster expanded technology at such a rate that it requires public statement.
If the sandbox phenomenon is over and/or in the process of ending, what would it likely look like? Would it be rolled out all at once? by topological area? by chronological time in the sandbox? alphabetically? by pages or by domains?
What will the results look like to us as they change? There must be tens of thousands of sites that have been released since last Winter that are sandboxed, as they take their place in the higher SERPs, will there be a mad assualt? or will it be more like a gradual infiltration? should we expect to see gradual changes in every area over time, steady like an hour glass, or will we see a week of dramatic change to be followed every couple of more weeks with dramatic change for a couple of months? or will we just wake up one morning to find that hurricane google has re-written the face of the internet with major devastation in it's wake and young hopeful sites seeing sunlight for the first time?
With MSN's new engine expected to go online perhaps as early as February, and google's known fondness for upstaging MS, how much later can they wait before they release the sandbox? The SERPs are apparently beginning to change. I've already heard of several people who've claimed their many-month-long-sandboxed site is out of the sandbox. Could it be that this is really the beginning of the end?
A little late nite tea leaf reading never hurt anyone.
Here is my interpretation: it is Googles engine to do with as they wish. In 2003, affiliates has Google by the tail and directed it any way they wanted - atleast until Florida.
To suggest that google facing a tech hurdle and not disclosing that to investors is like saying NASA should have warned the public that space travel could be hazardous. Of course Google is facing tech challenges every day - all tech based websites are.
I'll ignore the whole issue of whether or not Google would be guilty of culpable negligence or not, but the idea of "letting in" a huge amount of sites into the index all at one time is interesting. Just for argument's sake, IF renee's theory of extra indexes is true, and IF Google finally had the ability to fold these extra indexes into the main index, WOULD they do so all at one time, or would they do it over an extended period? Blasphemer raises a good thought in that Google does like to upstage its competitors just before they make a big move, so doing *something* right before MSN released its new search engine would be typical of Google. And if that *something* was the merging of different indexes, then they would either have to do it semi-gradually over the next month or so, or all at once, in order to beat the MSN announcement. Even the semi-gradual approach would seem to wreak a fair amount of havoc over the next few weeks, and certainly the all-at-once approach would probably cause many webmasters to look for the highest window to jump from. Either approach would make the first quarter of 2005 one of the most interesting SEO-wise in a long time (and we've had lots of interesting moments in the last year). Interesting subject, blasphemer - even if it is just late-night tea leaf reading. :)
Best educated guess I've seen here in a long time blasphemer, that's exactly what I would expect google to do, and I've wondering when this process would start, obviously they can't keep up their current mess for much longer, they need to have their new systems in place well before MSN beta goes gold, failure to do so would result in a potentially irreversible slide in public perception and habits.
My guess is that your guess is exactly 100% correct. The only thing that might make it take longer than you project is difficulties implementing this on both the algo and the data center side.
I've been expecting something very much along the lines of what you are suggesting for quite a while, it's obviously been extremely difficult to implement the new matrix or it would have been in place months ago.
Google's total and utter silence on this issue, aside from the odd spin they throw out to keep a semblance of denial about the problem, and the upcoming solution, is, and has been, extremely eloquent. A silence that would be motivated by exactly the issues you layout. Nice post, somebody's got their thinking cap on today...
Wishful Thinking....I'll have a pint of whatever Blashphmer is on!....Unless something happens in the next three months we are 1 year into the sandbox. I guess we all want to believe that they cannot possibly keep it up that long. Ah well, there's always the hope that MSN will shake it up. By the ways I have not seen one single report of sandboxed sites appearing in the serps did I miss something while cooking the turkey.
<< By the ways I have not seen one single report of sandboxed sites appearing in the serps did I miss something while cooking the turkey >>
that means you haven't been paying much attention.
I can say this much, about a month or so ago the three sets of kw phrases I search for on a daily basis returned more than double the number of SERPs. Mind you, these kw phrases are totally unrelated, as they are for three unrelated sites. Still, none of my positions in the SERPs changed beyond the normal fluctuation of a postition or two. So is everyone here suggesting that while the number of SERPs have increased the sandbox is still in effect? Or is it that not everyone is seeing this rise is the number of SERPs across the board?
indexed pages went from 4 to 8 billion.
There is no sandbox.
Nice post blasphemer.
I would say Welcome to Webmasterworld [webmasterworld.com] but I get the sneaking suspicion you've read it before.
Another question. Given that Google's propensity for trying to upstage MS is no secret, will MS try and surprise Google with an earlier than planned launch?
Congrats on first post, blasphemer - as there's no need to say welcome to WebmasterWorld :)
>> Is The "Sandbox" Ending?
That depends a great deal on what, exactly, you personally interpret as being "the sandbox" (*), as well as whether or not this phenomenon is intentional, desirable, or at least harmless, as seen from Googles perspective.
Anyway, it's most probable that it's ending, sooner or later. If not for anything else, then only because most things are ending. Also, most things are replaced by, or transformed into, other things in the process.
If i was to follow the tech problems argument for a moment, i wouldn't consider an architecture upgrade a problem outside the realm of "hard+software logistics" (which would offer some serious challenges after all, as we're speaking large volumes here, and it's basically the bread+butter of said firm). It would, however, fall into the "desired" box, although perhaps as "with some undesired sideeffects/bugs". Such a thing would be very gradual, and accompanied by a lot of testing. We wouldn't really notice it for a long time, and perhaps not at all, i'd say. At least the engineers should work very hard on making it non-noticeable (but on this particular forum, oddities do rise to the surface quite often). FWIW, that switch might already have happened, i'd say.
OTOH, the slow accumulation of "a set of webmaster problems that seems related" (or perhaps just increased awareness) during the past year seems to support the argument that some kind of process has been going on. An architecture upgrade is not the only process i can imagine yielding such effects, though. So, to end this speculation from my side; Even if we assume that we are in the middle of an architecture upgrade/switch, the interesting question is not if "the sandbox" will end gradually or instantly.
In stead, you should ask if this "feature / phenomenon" is an inherent part of the way the system is supposed to work, or not.
(a) If no, then this "thing" is a sideeffect, and my best bet on timing is "gradual", or perhaps in a few "jumps".
(b) If yes, then this "thing" is found in both architectures, and it will not "end" after the supposed switch.
For both scenarios, my best bet is that some pages/sites will "escape the sandbox" sooner than others, and that some will be "escaping gradually" every day for the forseeable future.
(*) I'm not trying to troll here, only emphasizing that there are numerous different things attributed to the term "sandbox" in different threads, by different members. We've all got "a general idea" but the specifics, as well as the theories differ a lot.
>> about a month or so ago the three sets of kw phrases I search for on a daily basis
>> returned more than double the number of SERPs
November 10, as noted by neuron in msg #131 here [webmasterworld.com].
yes I realize the number of pages went from 4 to 8 billion, that was tough to miss. Still, from some of the messages it seemed like there might be at least some areas where people weren't seeing much difference. I was just wondering if that was indeed the case. It seems not.
There were huge "of about X results" changes in many SERPs I watch. I'd say most everyone noticed the same.
It was nicer to tell a semi-sandboxed client they were #20 out of 115,000,000 as opposed to 50million. ;-)
G needs a big change...something to keep it fresh, updated and the leader of SEs.
The surpise from MS to G i think it could be a desktop MSN search field in the new MS W1ndows products...that's what really scares G.
But i am sure G will answer back...
MS got in its 'house', G will get in MS 'house' too...
i expect more G desktop products.
And the 'game' or 'war' is getting harder and harder....
|To admit there was a capacity problem after all this time might be taken by some to be an admission of culpable negligence in their failure to advise potential investors regarding serious technical issues during their IPO period. |
This makes it sound like Google is responsible for indexing the world. Capacity is always a problem and never a problem. Certainly there is no guilt on anyone's part, to suggest so is absurd.
I don't think the sandbox will ever end - it's part of their migration to the idea of Hilltop. It is the introduction of Hilltop ideas that make it more difficult for new websites to rank quickly. PageRank, LocalScore, SEO, the rules have changed slightly.
I also once posted my theory about two indexes. The SERPs I see and the experience I have with a new website can all be explained by Hilltop. It may be two physical indexes or it may be virtual, but when google does not find Hilltop results, it returns "old" results and that is why some new site rank on uncommon queries - not queries that have less than 500,000 results as some suggest.
|Small Website Guy|
I don't buy into the theory that the Sandbox is a capacity problem.
My sandboxed pages show up fine in Google if you search for something super specific that exists on no other page. They are in the index, they just don't get displayed in the SERPs.
If Google really wanted to keep the index smaller, they'd kick out the PR0 pages.
The capacity problem takes that into account. The pagerank calculation matrix can handle 4 billion urls. That is the real index. Another index of 4 billion urls is there for backfill. Those in the index just not the right one for ranking against others that are in the right index. That is the simplified version of the theory.
|My sandboxed pages show up fine in Google if you search for something super specific that exists on no other page. |
|new website can all be explained by Hilltop |
Does Hilltop explain how the same content on a new domain, retaining an identical set of backlinks, can have dramatically different results?
The sandbox has nothing to do with capacity. Look at how it works.
Since you state that as fact, please tell us how it works.
|The sandbox has nothing to do with capacity. |
The sandbox doesn't prevent new pages from being added to the index. It's a ranking problem, not an indexing one.
|Does Hilltop explain how the same content on a new domain, retaining an identical set of backlinks, can have dramatically different results? |
Yes it does, this has to do with content of the site itself and recognizing it as an authority site. New sites are having trouble being recognized as an "authority" for the topic on which they are hoping to rank. That is where we are seeing a delay; it is very likely that Hilltop, as deployed by Google, will not allow sites to immediately be recognized as an authority. Quite frankly, this makes a lot of sense 90% of the time.
When authority cannot be established for a query, then the Hilltop based algo cannot be used. This is why new site rank for "oddball" queries where a corpus of information does not exist. Hilltop is bypassed in these situations. This gives the appearance of an index with two sets of results.
|I don't buy into the theory that the Sandbox is a capacity problem. |
In today's computer environment, there is no such problem as capacity. That is what I meant earlier. It is only a problem if you make it a problem. I've helped build very large databases and they all have problems and everything is a compromise - speed versus complexity of queries. You can optimize a database for anything you like, it is just that you cannot optimize for everything at that same time. Again, to say that Google is negligent is nonsense.
sandbox = problem with calculating page rank
sandbox is intentional
still not entirely 100% sure
I've noticed something interesting re Google vs MSN results--if you do a link popularity check using the market leap link report (which brings up the number of links found in 7 major search engines - minus own domain) the MSN BETA engine always produces multiple times more links than the other engines.
Granted the Google link report is broken but it's interesting to compare the other engines (which are pretty consistent) to MSN BETA which produces 7-12 times more links than what the others list.
Others would say "The sandbox doesn't prevent new pages from being added to an index."
|The sandbox doesn't prevent new pages from being added to the index. |
I have studied the sandbox a lot and the capacity theory seems to fit the pattern more than any other. The arguments appear stronger and those arguing it appear more knowledgable. But none of us/them are Google engineers.
|New sites are having trouble being recognized as an "authority" for the topic on which they are hoping to rank |
I agree with your logic but Iím just not seeing that in the current SERPs.
is it possible that Google are not recognizing any authority site?
It might be they are only listing "expert" site.
[edited by: GerBot at 4:33 pm (utc) on Dec. 29, 2004]
I have four sandboxed sites. All are doing great in MSN Beta so I'm certain that my SEO strategy works just fine.
I am therefore convinced that the Google sandbox is a deliberate ploy on their part to increase revenues. Yahoo bought Overture and their revenues skyrocketed. Investors were thrilled. They didn't care about Yahoo's "ink contaminated" results, just the top line.
Google had to respond and get their revenues up and what better way than to force new players to buy Adwords. Don't for one minute think that the people at Google think strategically. That is reserved for privately owned companies. As a publicly traded company, the top line is as paramount as the bottom line. Their long term thinking do not extend beyond the next shareholders' meeting.
In the short term, why should they free up sandoxed sites that are buying adwords? That would be tantamount to shooting themselves in the foot.
As a competitor, if G had only Yahoo to contend with, it would have been like a walk in the park. Yahoo is the search equivalent to K-Mart who no competitor loses any sleep over.
However, MSN search is another kettle of fish. Firstly this is hardly core business for them and they have the luxury of focusing on "owning" the market in the medium to long term.
These are not the jokers they have at Yahoo. Their aim will be become the Wal-Mart of search, and they will brook no competition. When Wal-Mart for instance, got heavily into toys, specialist toy outlets like Toys'R Us operating close to them, were forced to close.
These are strategic people with deep pockets who do not need to focus on short term profits. They will roll out good, fresh results and aim to do what they do best, "dominate" the market.
MSN is a Tsumani waiting to happen. If I were Google, I would be very nervous indeed. However I'm sure they are not, resting on their laurels, hoping that MSN is noting more than a bad dream.
In the short term G in my view won't give two hoots about their stale results. That is until searchers begin abandoning them in droves. It is then and only then will they will get into a crisis mode, open the floodgates and pray that it is not too late. In the meantime, they will be guided by short-term profits.
Don't expect the sandbox to end anytime soon.
Just my 0.02.
Oh, no! Not one of these sandbox threads again! Everything I read in this thread has been said already in other threads about this subject.
Here are the top ten WebmasterWorld responses to the sandbox theory:
10) The Sandbox? LOL ... doesn't exist!
9) The Sandbox? You mean the LAG?
8) The Sandbox? You mean Hilltop?
7) The Sandbox? You mean Florida?
6) The Sandbox? What a clever invention!
5) The Sandbox? Mwuahahahaha ... it exists but I'm out of it!
4) The Sandbox? Mwuahahahaha ... it exists but I never got into it!
3) The Sandbox? How can Google be so stupid as to do such thing in the first place?
2) The Sandbox? Wow, yeah, <whisper>Google must have technical problems.<whisper>
1) The Sandbox? Huh?
Part of the problem with calculating expert/authority sites is if the data set is filled sites that are there for the sole purpose of manipulating the data set itself. Is my widget site really that much better than yours? Maybe I can convince you it is, with enough critical mass. By throwing a date factor in they can combat hit-and-run SEO and grow the web up a little.
>> "the index" vs. "an index"
The number of indexes has never really supported the lack of capacity argument. Say that Google had all the capacity in the world - nothing would stop them from having two indexes, ten indexes, or even a hundred if that would serve their purpose better.
The more capacity they get, the more indexes they can possibly make. Sheer storage capacity would do it, regardless of 32 or 64 bit technology.
>> FWIW, that switch might already have happened, i'd say
(my post above, about 32-to-64)
I'm having second thoughts on this. Of course it might still have happened already, but i'm not really sure that Google would find such a shift worthwhile, at least for the time being. There's a couple of things to it:
- The often mentioned Google linux and Google filesystem. While it's probably still true that they prefer to buy standard components, their complete system is nothing like the collection of boxes some of us might have, it's more like the scale of these guys/girls [top500.org]. Google is simply not a garage startup, it's a big business with big business risks and big business infrastructure and big business costs, that all have to be mangaged.
- So, who produces these 64-bit thingies, anyway? Is this technology mature or is it still relatively new in terms of high scale production? Is production quality reliable or does it still fluctuate, ie. how many fabs can deliver to specs and what are the tolerances?
- Who sells this stuff? Are they really standard components, available in large quantities, like, a hundred thousand or so? From a vendor where a sale of this magnitude wouldn't seem unnatural (ie spread rumors)? worldwide shipping? anytime?
- What's the specs like? Do they consume more power, and if so, how much more? What's the error level of these thingies, how stable are they - ie. how many should be replaced each day if you run a hundred K of them?
- What do you physically do when you upgrade? Do you insert a new motherboard and connect some wires, and that's it? Or do you need new powersupplies or something else also? Whole boxes? How long does this process take per machine?
- How about existing hardware? Scrap or re-use? Giveaway/donate? Publicity? Costs?
- How much software should be rewritten, and how much of a rewrite is required? How many man-years are we talking about here?
- Precisely which benefits will this shift give? Does it add something that can't be obtained in another way?
- What's the exit strategy like, if this for some reason turns out to be all wrong?
- Is Google in any way forced to make that shift? Will they be forced to for some reason at some point? If so, exactly when?
- Will it decrease costs, or increase revenue?
- In terms of end-user satisfaction, how much quicker than 0.31 seconds for a search of "The world" would a search become?
Not that i think anyone can answer this set of questions just like that, but it was just the few things that popped up instantly when i gave it a moment of thought, so no doubt the people at Google have given these matters some consideration. I'm not sure the benefits outweigh the negatives, but then again, i don't run systems of that size, and i haven't seen the numbers (if such numbers exist), so i might be entirely wrong.
Added: I forgot one:
- What's the extra up front cost for a 64-bit thingie as compared to a 32-bit one?
- that's probably not the most important point, anyway, although these extra costs will occur every day.
[edited by: claus at 5:58 pm (utc) on Dec. 29, 2004]
|I have four sandboxed sites. All are doing great in MSN Beta so I'm certain that my SEO strategy works just fine. |
You're using a engine that is currently in beta to validate your SEO techniques. Talk about wishful thinking.
|Their long term thinking do not extend beyond the next shareholders' meeting. |
Hmmm...ever hear of Gmail, Google labs, blogger....
|They will roll out good, fresh results and aim to do what they do best, "dominate" the market. |
There are plenty of examples of companies that are still around despite MS attempting to dominate the market. Ever hear of Intuit or Adobe?
It's always amusing to see someone's interpretation of things which are (in reality) only theories. MSN may come on strong if and when MS is ever able to release longhorn with embedded search. Hell, if you want to talk about compnies resting on their backsides take a look at the dismal state of Internet Explorer. Losing 15-20 million user's to firefox may be a drop in the overall bucket, but there's no denying it's made a dent in the browser wars.
MS is not, nor has it ever been an innovator. As oong as google (and others) can continue to innovate, MS will be at a loss, as it is now an old, slowly turning juggernaught that has never had a single original idea of it's own.
If you're staking the future of your business on MS, you've got a long wait.
| This 75 message thread spans 3 pages: 75 (  2 3 ) > > |