Forum Moderators: open

Message Too Old, No Replies

Why does the 'Google Lag' exist?

Trying to understand its purpose.

         

bakedjake

1:43 am on Sep 29, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I had some in-depth discussion this weekend with some friends about the sandbox. Every theory on how to beat it kept coming back to one central problem - no one is sure why it exists.

I feel very strongly that until we have a good grasp on why it exists, it will be very hard to beat.

I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing. Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.

So, why does the sandbox exist?

The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?

graywolf

9:21 am on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Also I have not seen any site come out of it without a dmoz link.

Untrue, I had one come out in may without a DMOZ link.

rfgdxm1

10:02 am on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>Suppose you had a 1200 page website, but for some technological reason only 1000 pages could appear on the Internet at one time. What would you do? Not publish the newest 200 pages, or not publish what you consider the 200 worst/weakest pages? If Google has a capacity problem, why not just remove all PR0 pages from the index?

Because Google if it was a capacity issue should want to omit the pages most likely to be problematic from the index. If they did what you suggest above, this would mean new pages on established news sites wouldn't be indexed because they will start out at PR0 when they are found. This would make the index look really stale. A spammer can easily get his site above PR0 with a few decent PR links from other sites he controls. The spammer modus operandi is to keep tossing up new domains, and as the older ones are zapped, the new ones get take their place. Thus to deal with these spammers, Google decided that the worst/weakest sites likely were the newest ones.

Vork

10:34 am on Oct 6, 2004 (gmt 0)

10+ Year Member



Just had a site (not listed in dmoz yet) that popped out of the blue for some really serious design searches - not regional. The site is pr6 and the thing that seemed to play the trick was tweaking really old (pre-sandbox) links together with onpage content optimization.

The same thing seems to work but a lot slower with a pr5 site also it looks to me that May-start June links are just beginning to be factored in... with lower pr5 and pr4 sites it's all quiet yet...

Which leads to thinking - can this be that time of lag is inversely proportional to your absolute PR in any way? Also it does look like pre-sandbox links or links that stayed unchanged at the same urls for over a certain time are treated a lot better (this was mentioned quite a few times here before) and their degree of 'reputability' and being factored in is proportional in a variable way to the PR of pages they lead to. From what I saw this also might depend on multiple other factors like pr of the page linking to you, their domain's 'reputability', surrounding text relevance etc etc etc.

Anyone had experience with older established sites creeping out of sandbox by the similar means?

V.

rfgdxm1

10:52 am on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>Just had a site (not listed in dmoz yet) that popped out of the blue for some really serious design searches - not regional. The site is pr6 and the thing that seemed to play the trick was tweaking really old (pre-sandbox) links together with onpage content optimization.

There has been a lot of speculation, and some evidence, that it isn't new sites that are sandboxed, but in fact new inbound links. If you think about it, in the case of genuinely new domains, these will just have newer links, as people don't link to domains that aren't even registered yet. (And presumably Google would just ignore links to non-existant domains.) Thus, if it is new links that are sandboxed, to a casual observer it might appear it is based on the newness of the domain.

Your experience would be consistent with it being new inbound links are what are sandboxed. Because you did have some really old links, your site never was fully sandboxed. By tweaking those links plus on page content you were able to get your site to rank well. PR6 is pretty solid. With old links good enough to get this site to PR6, tweaking both the links and the on page content could ne enough to get decent rankings.

Whatever the sandbox is, the theory of starting out a new domain with minimal content and some solid inbound links has appeal. Don't do this, and when real content is added to the site, it won't go anywhere because of the sandbox. However, just let the site age until it is "ripe", and then slap up the content and tweak the links, it will soon rank well.

Vork

11:16 am on Oct 6, 2004 (gmt 0)

10+ Year Member



Thanks rfgdxm1 - my thoughts entirely regarding letting the new site 'ripen up' before slapping the content up the pages.

BeeDeeDubbleU

11:17 am on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



However, just let the site age until it is "ripe", and then slap up the content and tweak the links, it will soon rank well.

It would depend on your definition of "soon". One site that I created for a small consultancy has lots of original and interesting content. It is as good a resource as any on the subject matter but it is nowhere to be seen after six months.

Oliver Henniges

1:06 pm on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Do others agree that there was generally only one period in which sites were allowed in from the sandbox? If so, were all sites allowed in or just a few? Have there been other times when sites were allowed in?

I recall ftping some newly created single pages, not whole sites and indeed that must have been Mid June or so. They did get PR, while all pages created since then did not (the index.html page is within google's for four years now).I can't tell about Feb-Apr because I was off then, but if you say the sandbox phenomenon dates back to march then, yes, there must have been a short exceptional period.

Oliver

mfishy

1:40 pm on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



<<Do others agree that there was generally only one period in which sites were allowed in from the sandbox?>>

Yah, sometime in May, sites that were being boxed got loose which led many of us to believe there was simply a 2-3 month holding period. Since then it is as if time has stood still!

One quick point. Many here keep stating that G is fighting commercial spam here. Besides the obvious fact that this measure has ZERO impact on the existing serps and the really professional SEO/spammers have found ways around it anyway, countless non-profit sites are in the box as well...

neuron

1:46 pm on Oct 6, 2004 (gmt 0)

10+ Year Member



Is there any reason to believe the supplemental index is a 32-bit process based index, or is it possible the supplemental index is a 64-bit based index? Because if the supplemental index is 32-bit, then it too would be full at 4,285,199,774 web pages (or 99.8% of maximum capacity), and it would be full in less than 2 years*. So, wouldn't it make sense to go to a 64-bit system, even if they stay with the 5-byte index for the sake of reducing the size of the iterated matrix?

* Google had 1B pages indexed in 2000 and 4B last year, that's a growth rate of about 2^(1/2) per year. The number of indexes will double every two years, perhaps the moore's law of indexable information.

gomer

2:37 pm on Oct 6, 2004 (gmt 0)

10+ Year Member



When I mentioned DMOZ in my last post, I did not mean to imply that DMOZ carried any extra weight or was a way out of the sandbox. I was instead trying to say that the site had some decent incoming links, one of them being DMOZ.

From the latests posts I have read, there generally seems to be a consensus that the sandbox started sometime early this year, maybe about February, March. There was also only one period, about May (or June) where all sites were allowed in. If others do not see it this way, please let us know what you think on this and why.

If all sites were allowed in, this would appear to have little to do with spam fighting.

Perhaps what we are seeing is something like this. Google has an index that got full. They have a second index that newly crawled sites go into since the first index is full. Their search calculations cannot work across both indexes. At certain intervals, they allow sites from the second index "into" the first index thereby making those sites available to appear (in earnest) in their search results.

Obviously this is just a guess. I don't have reasons to give as to why their search calculations cannot work across both indexes and how they let sites from the second index into the first. I know it has been proposed that sites in the first index could be removed if pages were no longer available or if they were found to be of low quality but I am not sure if I buy into that theory.

This 354 message thread spans 36 pages: 354