Forum Moderators: open

Message Too Old, No Replies

Why does the 'Google Lag' exist?

Trying to understand its purpose.

         

bakedjake

1:43 am on Sep 29, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I had some in-depth discussion this weekend with some friends about the sandbox. Every theory on how to beat it kept coming back to one central problem - no one is sure why it exists.

I feel very strongly that until we have a good grasp on why it exists, it will be very hard to beat.

I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing. Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.

So, why does the sandbox exist?

The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?

caveman

4:45 pm on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They have not beat the sandbox!

We have a phrase for comments like this one...we call 'em "the world is flat comments". :-)

This knocked out the single largest short term threat to G's future quality...not a small thing with an IPO and attendant scrutiny on the horizon.

3 months ago I'd believe you. Now, the IPO seems like a cozy wave-away excuse much like the "it reduces spam" line. ... I'd believe "capacity issues" before "spam fighting". ... But, if capacity issues are the real reason, I seriously doubt G would take 8 months to fix it. Capacity issues would be, I would consider, a major "drop everything now" type thing. Also, they would see something like that coming - the growth of the web is fairly linear.

The thing is, not all major decisons or events at companies are linear, or even planned. What if the capacity issue intersected with resource calls they had to make. Being mindful of the spam issue, they found that:
--the lag had the interesting side effect of discouraging newbies from blasting out spam sites and embarrassing G
--the public didn't see or care about any differences they were seeing in the SERP's.
--their SOM stayed constant.

Suddenly, and especially with the IPO coming, any incentive to move quickly to upgrade systems was largely nullified. And BTW, none of this precludes them from still working on algo changes to continue to fight spam algorithmically.

Plus, if they are planning an entirely new approach to managing their SERP's, based on some of the new areas they've delved into, then this gives them needed time to get it all right before launch.

founders' and management comments on info versus commercial sites;

Naw, caveman. The Google "nice guy" line worked two years ago. I don't believe it anymore. There are a lot of good people working at Vendor G now, but let's face it; the minute they went public, their management ceased to be a bunch of guys concerned with changing the world. The "new" management is the American economy, and the American economy demands profits.

Actually, I never believed that the founders' bias towards info sites over commercial sites was a 'nice guy' thing. I thought it lacked an understanding of the real world, in which people actually do search for information on commercial goods and services.

isitreal

4:58 pm on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<<< founders' bias towards info sites

caveman, that last post is about as sensible a thing as I have read here in the last year.

I know from doing info sites that the reason it's so darned easy to get stuff in the serps is that the content is written chock full of info, as soon as you start writing 'about' something, you have to use different words, phrases, etc, the more there are, the more there is to find, and what the google algo excells in is extracting, you guessed it, relevant 2-3 word info phrases. So it's not so much that an info site is favored, it's that it has more real information on it, which is as you note what most searchers are looking for.

Sort of going along with that 'what' thing, google is good at pulling out 'what', not 'why'. Once I learned this, it became very easy to write FOR google. Now when I do a posting on WebmasterWorld, especially html and css forums, I often consciously decide whether I will include good serp, what filled content, or go very vague, zero serp result content... very sad comment on how information is gotten now, but it's the way it is.

[edited by: isitreal at 5:06 pm (utc) on Oct. 6, 2004]

caveman

5:07 pm on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Gee thanks isitreal. :-)

FWIW, WRT links, I don't know what role "newness of links" plays, but I can tell you that we have several new sites launched in March that are doing OK. New domains, new links. New links may play a role in this, but they do not necessarily *hurt* a site.

It's also worth remembering that while PR is all about links, G's algo is all about patterns. And then there are those pesky filters. And perhaps most important, ultimately, these things all exist to help determine measures of *quality* as seen through the G lens. ;-)

IMHO, there have been at least 20 or so posts in here about what it takes to get past the sandbox. I think people intuitively know, but just aren't getting it done. This work is not getting any easier. Certainly, the sandbox has contributed to that, which can't be a bad thing from G's POV. As has already been noted, there aren't many webmasters anymore boasting about how easy it is to game G. :-/

BeeDeeDubbleU

5:14 pm on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They have not beat the sandbox!

We have a phrase for comments like this one...we call 'em "the world is flat comments". :-)

I have a phrase for people who say things like, "I beat the sandbox". Prove it or don't slag me off.


the lag had the interesting side effect of discouraging newbies from blasting out spam sites and embarrassing G
--the public didn't see or care about any differences they were seeing in the SERP's.

After eight months (and counting) of virtually no new content on Google's version of the Internet it's only a matter of time until the public (and hence the press) catch on.

Actually, I never believed that the founders' bias towards info sites over commercial sites was a 'nice guy' thing. I thought it lacked an understanding of the real world, in which people actually do search for information on commercial goods and services.

This was not a 'nice guy' thing. Remember that commercialism was a side effect of the Internet, which was not designed to be a commercial entity. The Internet, you may remember used to be referred to (still is?) 'The Information Superhighway'. Not the 'Yellow Pages Super Highway'. I am involved in a business, I even have an Adsense account, but I still believe that information sites should be preferred to commercial.

caveman

5:25 pm on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>Prove it or don't slag me off.

And as Jake suggested, just because you can't see it doesn't mean it's not happening around you. :-)

Yours was the kind of comment that has a tendency to put off those who, while not willing to share trade secrets, might be willing to at least offer some information, or point people in the right direction.

When people try to help, if you can't be nice, perhaps you shouldn't say anything at all. :o Or, at least try to be contributory.

My partner was right...should just keep my mouth shut...

mfishy

5:37 pm on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



these things all exist to help determine measures of *quality* as seen through the G lens.

Although I believe this is part of what they are trying to do, my research indicates that this is not part of the sandbox phenomena.

we have several new sites launched in March that are doing OK

Who woulda thunk that 7 month old site's ranking in google would become newsworthy?! :)

isitreal

5:38 pm on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<<<< just because you can't see it doesn't mean it's not happening around you.

Yep, seeing the same thing on other topics, claims that something can't be done, then we do it, and it works fine. Just to be clear however, you are talking about multiple thousand [or more] result type keyword searches, correct?

Earlier 'lag' threads said ALL sites, this was easily disproven by putting up a site with niche type keywords and not having it sandboxed. Which meant that 'lag' was not a generic event, applied to all new domains, but the result of a filtering process of some type to determine which sites get placed in that. And a filter has holes, that's why I don't tend to disbelieve the claim that it can be gotten around. Hackers always laugh when someone says: my system cannot be hacked.

caveman

6:30 pm on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>Earlier 'lag' threads said ALL sites, this was easily disproven by putting up a site with niche type keywords and not having it sandboxed. Which meant that 'lag' was not a generic event, applied to all new domains, but the result of a filtering process of some type to determine which sites get placed in that.

Yeah. I don't quite get why that has not been noted more often. Clearly the sandbox is not universal; that is easily seen. So it should not be a hard leap that it is either algoritmic or filter based or both. So it should not be a hard leap that not all new sites (or searches related to those sites' pages) are sandboxed. But that last leap seems to be hard for some. :-/

I've said before, we find it useful to think of this as a tough algo with tightened filters, for which certain hurdles need to be met or exceeded. Also, less than a third of our new sites passed muster, so far, and we can't exactly say why, though we have theories. We can only say that some have, and some have not. I think mfishy said something about it being perplexing.

isitreal

7:38 pm on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



And this: on a sandboxed rebranded 301'ed site, the original material from the original domain name is sandboxed heavily, by url I am assuming, but new material is ranking fine.

I experimented with this deliberately by adding content pages that were so specific that they would show in serps after the rebranding if new pages were escaping the sandbox, and they do. They started showing after about 1 month. So it's not an absolute site / domain type thing either in all cases.

However, a rebranded site is a different thing than a brand new site, but I think it does demonstrate some of the processes going on behind the scenes, perplexing would be the keyword here :-¦

My suspicion is similar to something said earlier, that what I'm seeing is a duplicate content type thing happening, except the original source is gone, doesn't exist anymore except as a 301 directive, but the system is so slow to update itself fully that things like this are falling between the cracks?

This ties in to what I said earlier about 3 or 4 pages, several of which have not existed for about 1 year now, showing up in a site:originaldomain query. There does not appear to be a single index working, and if there is more than one, the integration between them seems to be flawed.

Could it be that there are just too many hacks being applied?

<added>
<<<<<But I still don't believe the capacity issues, guys. I just think Google is smart enough to see something like that coming.

BakedJake, MS has been working on their new file system since NT 4. They have a lot of smart people. It was supposed to be up by NT 5. Then it was supposed to be available on Longhorn. That's a very long time. And they still can't get it working. With a 5 billion or so a year research budget. It's not a matter of being smart enough, it's a matter of the problem being very hard to solve I think, and events moving faster than they thought they would. I'm running Yoper with Reiser4, certain things have more freedom to move fast than other things, depends on how stable you need the processes to be, google can't have a system wide failure, it's out of the question.

BeeDeeDubbleU

8:08 pm on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My partner was right...should just keep my mouth shut...

Yes - perhaps she recognised that you started the slagging :)

Look, after about 350 posts I don't think we are really any closer to determining why the 'Google Lag' exists. It has been said before and I will say it again. It is highly unlikely that this is intentional. All the indicators are that this is a defect in Google.

The Internet in it's present form is only what - perhaps six or seven years old? Why would any right minded SE think that denying their clients access to up to 10% of available sites was a valid action, especially when these sites are the newest and most current?

This 354 message thread spans 36 pages: 354