|Why does the 'Google Lag' exist?|
Trying to understand its purpose.
I had some in-depth discussion this weekend with some friends about the sandbox. Every theory on how to beat it kept coming back to one central problem - no one is sure why it exists.
I feel very strongly that until we have a good grasp on why it exists, it will be very hard to beat.
I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing. Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.
So, why does the sandbox exist?
The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?
mfishy please don't delete yourself...I like your posts.
|do you also believe the supplemental index does not exist? |
To be honest, I don't pay a lot of attention. Staring at our screens for hours while trying to discern the meaning of things hasn't helped us much in the past. When we stare at the screens, we pay more attention to the "what's" than the "why's." I always have theories about the "why's" because the theories help generate hypotheses about what works and what doesn't, but ultimately what's important is *what* works.
As dear old cavedad used to say: "I'm not exactly sure why it rains, but I know enough to hunt when it's sunny, and get into the cave when the skies go dark." Cavedad was a good provider. ;-)
P.S. Don't get me wrong...I agree with Jake's premise that learning the why's here might really help. But I can't contribute much there. All I can do is offer the encouragement that this thing is beatable, and perhaps offer some things to think about. Which probably means I should get out of this thread!
I've gotten a new domain by it, but not for competitive key words. In fact, there was no lag at all for a non-competitive keyword smallish site I put up recently.
I assume you are able to somehow get a new domain by it with competitive keywords or you wouldn't be saying this stuff, that's interesting, nice to see where the bar stands. And hopefully this isn't the pre-linking strategy, where you point many links at the domain to be long before it goes up, also not a secret.
<<<<All I can do is offer the encouragement that this thing is beatable, and perhaps offer some things to think about>>>>
which seems like, at least theoretically, the only reason to spend any time on this forum, no?
[edited by: isitreal at 6:10 pm (utc) on Oct. 5, 2004]
<<a new domain name can get by it>>
This sandbox is a side effect of Florida. Obviously, SEOs did not (and COULD not) recognize it's existence until several months after Florida (January or February perhaps?). So trying to isolate it and trying to identify a "cause and effect" is short sighted and overly simplistic. Sandbox is simply one factor of Google's new algo.
Google used to use incoming links and an ODP listing as their primary measure of a site's quality. Obviously they have revised that criteria with Florida and our task is to identify the new criteria and take advantage of them.
I stand by that statement. The entire Florida update was to thwart SEOs. This thread and many others bear witness to how effectively Google HAS thwarted SEOs. After all, it's been quite a while since someone has posted, "Google is easy to optimize for, just get lotsa links!" ;)
|"Why does the 'Google Lag' exist?" The answer to that question is a simple one. It exists to thwart SEOs and their manipulation of Google's index. |
[edited by: DaveAtIFG at 6:21 pm (utc) on Oct. 5, 2004]
>>Of course I can. I rank sites as fast as ever,
WOW! this is the most significant piece of information. since you do not want to divulge (rightfully so) your secrets, why don't we set up an experiment. let's set up a new domain (i'll pay for the registration is if you want), decide on the keywords and you prove that you can get around the sandbox. this would be most helpful to everybody even not knowing how it's done. i'll even volunteer housing it in one of my servers, but i'll understand if this may compromise your secret. how much time to get results? 2 weeks, 1 month?
<<<<This sandbox is a side effect of Florida.
That explanation doesn't explain enough for my taste, although it is more or less what I would expect to read if I were reading a google prospectus.
What I'm seeing is Yahoo/MSN traffic rising slowly as people get sick of their search engine not delivering new stuff anymore, aka throwing out the baby with the bathwater. I just can't see why any search engine would deliberately do this, the SEO angle just isn't as convincing to me as it appears to be for some people, I see that as much more of a sideaffect than the primary cause..
<<After all, it's been quite a while since someone has posted, "Google is easy to optimize for, just get lotsa links!">>
True, except there isn't much whaling and whining about sites losing postitions - the same seo's are still ranking up and down the serps....Also, it is sorta hard to believe that google enginerr that came in and said "I've got it. The cure for preventing all future spam! We will simply not score any new sites!" :)
|how much time to get results? 2 weeks, 1 month? |
48-72 hours. And mfishy's right, it's possible to rank sites faster than ever.
But, that's not the point. We're trying to establish why the sandbox exists.
|It exists to thwart SEOs and their manipulation of Google's index |
If that's the reason, then Google is being extremely stupid, because as mfishy says (and I said in my first post), the same spammers are still ranking. The only thing the sandbox is doing is hurting innocent people.
Google isn't that stupid. Look at the SERPs. The sandbox has not helped with the proliferation of spam. If you think it has, you're not looking at competitive SERPs.
No, I still can't buy the spam explanation, not even when it comes from my favorite board member. ;-)
>This sandbox is a side effect of Florida.
Understanding this IMHO is the most important thing. Or, if not understanding it, at least accepting it as a premise. Either way it leads in the right direction WRT site dev.
Not enough people are paying attention to:
--the fact that aff and small innocent sites got murdered in Florida;
--the fact that it happened again, to a lesser extent, in subsequent updates;
--founders' and management comments on info versus commercial sites;
--the fact that the 'lag time' for new sites to appear in the SERP's was extented in May;
--the fact that large auto-gen sites and feed sites recently fell victim to 'tweaks';
--G's competitive differences versus Y! and MSN (and how G's self-perception might cause them to take certain directions with their core business).
They didn't. It's a side effect, not an intended goal. Whether it's a desirable side effect is for Google to decide. In other words, does it solve a bigger problem they were addressing?
|I just can't see why any search engine would deliberately do this |
Is it a deliberate side effect? I have mixed feelings so far and have not yet formed an opinion.
>The sandbox has not helped with the proliferation of spam.
I beg to differ. I've seen only a very few *new* spam sites, and we work in a lot of competitive categories.
Does anyone here think that G was doing a good job of controlling spam before? Not me. This knocked out the single largest short term threat to G's future quality...not a small thing with an IPO and attendant scrutiny on the horizon.
And why assume that G is done? IMHO we're simply witnessing an evolution:
Acts Two, Three and Four
Florida Tweaks and Testing (Austin, Esmeralda, etc.)
Feb 04 Florida Tweaks (Sandbox 1.0)
May 04 Florida Tweaks (Sandbox 2.0 - standards toughen and/or the lag gets longer)
Sept 04 Florida Tweaks (goodbye feed sites)
New algo that hammers loads more existing/old aff sites.
Florida happened on the back end of the algorithms, after Google knew the searcher's search terms. This much was clear, even if not much else was.
The Supplemental Index, the URL-only listings, and the sandbox are happening on the front end, someplace after the crawl but before the old-style, normal indexing.
I can believe that Florida was intended to fight spam. I also believe that Google sold more ads during Florida, and liked what they saw. But it was starting to attract adverse publicity at a certain point, so they turned back the knob on Florida.
This latest thing looks like a different problem. I don't think there's a knob this time. Remember how quickly the knob was turned back last December? It only took about a week or two.
This time it looks like Google has lost the knob option. It's hard to believe that Google would do something this dangerous to their reputation right now. Sure, after the lockups expire and everyone is rich, anything goes. But this is a very critical time for the stock price. Next month, and the four months following, lots of lockups expire. If the stock price drops even 50 percent, back to where it started on day one, then this 50 percent represents a lot of money for a lot of Googlers.
There might be something major in the works, but even rolling out that will only make half of all webmasters happy, and the other half furious. Very risky. It's also risky to just let things deteriorate for a few more months, but it may be that Google sees this is the best, or even the only, alternative.
<<New algo that hammers loads more existing/old aff sites.>>
I do not and have not seen aff sites effected any more than any other type of site. Nearly all content sites are supported by ads so it really would not help them anyway. Also, they currently run the biggest aff network in the world...adsense...
Maybe we give the boys at G a bit too much credit for what they can and cannot do?
“Google's mission is to organize the world's information and make it universally accessible and useful”
Right up until March 2004, that is. As of today the last 8 months worth of new information, which is the most important kind of information, is most definitely not universally accessible in this index.
I don’t know, I just find it hard to accept this is all part of a master plan; just leaving out the last 8 months.
>I do not and have not seen aff sites effected any more than any other type of site.
Let's be clear. If you're referring to 'sandboxed sites' then yes agreed. This thing keeps most new sites suppressed, aff and otherwise.
But, remember that in the months prior to Feb 04, the majority of new sites that quickly performed well in the SERP's were SEO'd commercial sites. Your basic new amateur hobby site did not jump to the top of the SERP's in its second week of existence the way new aff and other commercial sites so often did.
Plus, CNN ain't havin' trouble getting their new pages indexed. So what is the big deal (from G's POV)? The only people screaming about this are in here. G has gone on record time and again, in a variety of not very subtle ways, with comments that confirm their bias is towards info sites and not commercial sites. Commercial sites are for Adwords.
This is part of a long term, systematic assault on commercial sites, and aff sites tend to be lightening rods in this environment.
Whether or not this is a byproduct, or direct result, or algo, or filter, or front end, or backend, or all of none of the above, I'm not clever enough to know. What I do know is generally what sort of site G still seems to favor, and having found that out, it becomes a bit clearer what is going on. G is searching for ways, both blunt and elegant, to remove more and more commercial sites from the SERP's...in a lot of areas of the Web.
PS, IF part of what's going on is capacity related, that does not negate the possibility that G's attitude about commercial and spam sites is involved here. G's philosophies and objectives can, and probably often do, inform their choices in some technical areas...especially if allocation of resources is involved.
|This sandbox is a side effect of Florida. Obviously, SEOs did not (and COULD not) recognize it's existence until several months after Florida (January or February perhaps?). So trying to isolate it and trying to identify a "cause and effect" is short sighted and overly simplistic. Sandbox is simply one factor of Google's new algo. |
Google used to use incoming links and an ODP listing as their primary measure of a site's quality. Obviously they have revised that criteria with Florida and our task is to identify the new criteria and take advantage of them.
Right on, AFAIC, with the linking quality standards, et al, modified from what they were before.
That is exactly why sites hit during Florida exhibit the *identical* symptoms as so-called "sandboxed" sites, except that they've been around for long enough to show toolbar PR. Sites that got hit lacked the very things it more than likely takes to get around and avoid the "Google Lag" at this point in time.
And no, sorry dear - no one will be foolish enough show anything to meet a "challenge" in order to disprove a fallacious theory of some sort.
>>And no, sorry dear - no one will be foolish enough show anything to meet a "challenge" in order to disprove a fallacious theory of some sort.
the challenge was for him to demonstrate his statement "easy to get around the sandbox". if you think this is related with my theory, then you're thinking with your behind. but then, of course, us girls have a right to think with our behind.
<<<< G is searching for ways, both blunt and elegant, to remove more and more commercial sites from the SERP's
Interesting, I recently noticed a very weird sequence of results that seem to go with this suggestion;
While doing some real hardware geek type stuff I needed to get the jumper settings for harddrives,
A search for manufacturer model number jumper settings
did not give me the manufactor site, but a bunch of sites that mentioned this but did not have the answer. I finally gave up googling it and just went to the manuracturor sites to ge the info, or I used Yahoo, can't remember which, but this was a significant total failure to retrieve information that exists essentially only on the manufactorer website. This is what I would call tightening ti too far, it's irrelevant why this happens, this is something that users are going to start noticing, they are noticing it, I'm seeing small changes on average, more yahoo searches coming in than ever before. And remember, 30% or so of google stats belong to AOL if I have those numbers right, maybe 25%.
"I've seen only a very few *new* spam sites, and we work in a lot of competitive categories."
In my areas, almost all spam sites are new. There are very few of the "old spam".
If I was to just judge from the hyper-money areas I deal with: lag time only effects things in the top 80% of pages if they were ranked purely via a legitimate/clean/quality criteria. Sites that are primarily one single doorway page are the spam du jour, if they were created recently. Older pages like this are dead. New sites built on low quality algo ingredients do just fine, while old ones don't. New sites built on the quality algo components blow chunks, while old ones do fine.
Lag time is here not to frustrate seos, since it is easy to get around if you want to build sites that are intended to have a six month half-life.
But what else has Google done recently? The idiotbacklink data, and no tolbar PR update. What do these have in common? They are two things that don't confuse more experienced seos at all, but befuddle the less experienced types. I don't know why Google keeps making such $$$$$$$$$ presents to seos who have a solid grasp of things (and I don't mean "great" here, just "solid"), but that seems to be the reality.
If we don't have a PR update by the morning of the 11th, meaning they could still conceivably be now seen as adopting a quarterly schedule, then I don't see how any reasons really matter here. Google has either had either a massive failure in its data and/or will be unveiling a search database and algorithm completely different than what we see today.
I wish someone would point out one of these lag breaking sites, because I've never seen one.
Please feel free to sticky me a URL!
|the challenge was for him to demonstrate his statement "easy to get around the sandbox". |
|I wish someone would point out one of these lag breaking sites |
Why in heavens name would someone give up the biggest advantage they have over everyone else at this point?
They have not beat the sandbox!
<<<< This thread and many others bear witness to how effectively Google HAS thwarted SEOs.
This one slipped by me. They haven't thwarted SEOs at all, all they'v done is slow down the process by which they admit most, apparently not all if the posters are telling the truth, sites. Most is good enough. However, this isn't just thwarting the seos, it's thwarting trivial things like upto date results, getting the latest new sites freshly served to the user's screens. This isn't a good thing. There is absolutely no way anyone can get me to believe that this lag is a deliberately planned method to cut down on SEOs manipulation, that's makes no sense at all, it's like saying a newspaper will only print new news if it's about older news stories it's been following; all other news items you'll have to wait 6-8 months to read. The web isn't static, it moves, that's what it is. Freezing sites for 6 months, even if you can avoid the freeze if you know how, only helps the SEOs who know how to avoid it, assuming they do.
This is not a business plan I'd invest in, and if this is google's current plan, then I'd start selling my stock as fast as possible. MSN must be amused. Their spiders have no trouble indexing large sites quickly, neither does slurp, but googlebot just limps along, as if it had to wait for something before adding more pages.
And this is not the business plan that made google a success, it reminds me much more of what made altavista fail.
If my tired old memory serves, new sites and pages are routinely reported to enjoy a week or three of prominent listings before slipping into sandbox oblivion... "News" still appears in a timely fashion.
That fact does suggest that the sandbox is deliberate.
|They have not beat the sandbox! |
"I've never seen it, so it must not be happening", right?
|but googlebot just limps along, as if it had to wait for something before adding more pages. |
No.... for the 67,000 time, this is a ranking issue, not an indexing issue. Please re-read the thread and, ideally, test the damned thing yourself. It's apparent that there are many people commenting in this thread that don't even have sites in the sandbox, or are being generally ignorant because it's trendy.
|This knocked out the single largest short term threat to G's future quality...not a small thing with an IPO and attendant scrutiny on the horizon. |
3 months ago I'd believe you. Now, the IPO seems like a cozy wave-away excuse much like the "it reduces spam" line.
I hate to say it, but it's possible. I'd believe "capacity issues" before "spam fighting".
But, if capacity issues are the real reason, I seriously doubt G would take 8 months to fix it. Capacity issues would be, I would consider, a major "drop everything now" type thing. Also, they would see something like that coming - the growth of the web is fairly linear.
|founders' and management comments on info versus commercial sites; |
Naw, caveman. The Google "nice guy" line worked two years ago. I don't believe it anymore. There are a lot of good people working at Vendor G now, but let's face it; the minute they went public, their management ceased to be a bunch of guys concerned with changing the world. The "new" management is the American economy, and the American economy demands profits.
|"News" still appears in a timely fashion. |
"News" typically doesn't appear on a new domain.
Google's trying to do something with new domains that want to rank in competitive queries. The question is what in the hell takes 6-8 months to do?
<<< this is a ranking issue, not an indexing issue
If you insist on trying to maintain that possibly related phenomena are unrelated by definition, you'll never get your question answered in most likelihood since you just might be excluding the answer in the process. But at least finally there is the barest consideration that there just might be a capacity problem, took a year, better late than never I guess.
We are not looking at something that is performing in the way it was designed to perform. It is not indexing large sites quickly, or, often, completely, it is not listing fresh new content in a timely manner when it comes to that content being on new sites. This is the foundation on which google built itself, it is the direct cause of their success. If google was your operating system you'd be switching to Linux or Macs right around now.
If SEO types managed to force them off this central goal, then there was something intrinsically flawed with google's methodology, and I would guess that in fact there is. If spam sites flooded the index, this added far more pages than they would have anticipated. So projections may have been unable to anticipate this type of growth. Remember, it was in 2000 that they were at 1 billion pages indexed, more or less, and that was the whole web. Why should they project a 4x growth in 3 years?
SEO spammers are very much like virus writers, taking advantage of weaknesses and loopholes in existing programming to do what they want. Some like to blame the hackers, but the hackers point to the weaknesses and say if they weren't there there would be nothing to exploit. I tend to agree with this latter view.
Google is making more money now than it was however, so if they are smart they can deal with this by throwing more resources at it.
[edited by: isitreal at 12:08 am (utc) on Oct. 6, 2004]
|If you insist on trying to maintain that possibly related phenomena are unrelated by definition |
Sites in the "sandbox" are included in the index. Googlebot comes by, indexes the page, and then they appear when Google is queried for the page name.
Are you seeing something else?
|It is not indexing large sites quickly |
New site in sandbox - 60K pages in the Google index within 1 week.
Another new site in sandbox - 120K pages - this one took somewhere between 6-10 days, don't have an exact day since my reporting was down for a couple of days.
Are you seeing something else?
<<< Are you seeing something else?
Yes. I am seeing extremely erratic behavior. I'm seeing very very slow crawls through large sites. I'm seeing several weeks to index large amounts of new content on old sites. I'm seeing a sandboxed site return those supplemental type results for site: type searches. I'm seeing both the msn bot and slurp eat up sites at maybe 3-5 times the speed of googlebot. I'm seeing a site that has been gone for months still return pages that have been gone for 10 months when you do a site: search for it. What I am not seeing is the google I used to see.
These have been constant topics here on google news forums, I'm not the only person to have seen this. Site's partially indexed, blah blah, for the last 8 months there have been almost nothing but threads like that here.
bakedjake, try looking at google exactly the way you look at windows, I suspect that perspective change might do the trick.
|I'm seeing very very slow crawls through large sites. I'm seeing several weeks to index large amounts of new content on old sites. |
Google has changed in that you can't get 50K pages indexed on one PR8 link anymore. But that's not "sandbox".
|I'm seeing a sandboxed site return those supplemental type results for site: type searches. |
More related to the dupe filter than not, I'd bet. I see the same thing with sites that are caught by the dupe filter, then shuffled off to supplemental.
This can also happen for the same reasons above - when you have a 100K page site with like 1 incoming link or something.
Google is smart enough to know that you don't need 100K pages in main if you've only got one incoming link. Or even 100 incoming links.
|I'm seeing a site that has been gone for months still return pages that have been gone for 10 months when you do a site: search for it. |
Okay, that's weird. :) But I doubt it's related to the "sandbox".
We have a lingo problem, let's get on the same page:
Define: sandbox - The "sandbox", as I am referring to, is the phenomenon of the lag that a new domain experiences when ranking for money terms, while full indexing still takes place.
|try looking at google exactly the way you look at windows |
How will looking at an operating system change my perspective on a search engine?
<<< How will looking at an operating system change my perspective on a search engine?
I assume if you are the mod for the linux forums you have a certain amount of scepticism about anything MS says or has said about its Windows products? That's exactly how I would look at anything google says or has said about what it does or how it does it, or why.
Re the sandbox, lag, penalty etc, yes, that's what we're both talking about largely here. Why it exists, and all that. It's very odd behavior. Good also to see the term more precisely defined, it's not just commercial terms though, it's much wider range than that from what I see. Unless commercial just means x number of results returned? Hard to say. Is it a capacity problem, is it a ranking problem, is ranking being used to deal with a capacity problem, is a capacity problem causing a big glitch in ranking, which is being called a 'lag'. Hard to say. But not hard to call it a problem.
Imagine this: MS releases their new longhorn. But you can't install any new software on it until the software is 6 months old. That's to thwart potential security holes, or whatever. Paint this picture for any other tech company than google and you can see how absurd the business model is. Google still is getting a free pass though.
Nice to see that at least a few here have been able to hack this latest version, though I'm not positive that all they did was prelink the domain or something.