Why does the 'Google Lag' exist?

Forum Moderators: open

Message Too Old, No Replies

Why does the 'Google Lag' exist?

Trying to understand its purpose.

bakedjake

1:43 am on Sep 29, 2004 (gmt 0)

I had some in-depth discussion this weekend with some friends about the sandbox. Every theory on how to beat it kept coming back to one central problem - no one is sure why it exists.

I feel very strongly that until we have a good grasp on why it exists, it will be very hard to beat.

I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing. Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.

So, why does the sandbox exist?

The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?

BillyS

8:47 pm on Oct 4, 2004 (gmt 0)

Nice theory Billys but why update the smaller index less than the big one?

Updated less frequently to stabilize the results and conserve resources. Common queries should have been answered many times over, no need to rush in with new answers.

It also makes for a better user experience. They type in a common two word query and get virtually the same results a month later. The end user appreciates this because it allows them to find things again. This raises confidence in the results thereby creating loyalty to Google.

graywolf

8:50 pm on Oct 4, 2004 (gmt 0)

Ok for all of the people who say Google has reached capacity, how is it that you can add a new page to an "old" website and rank right away?

webdude

8:55 pm on Oct 4, 2004 (gmt 0)

Here's more for you 32 versus 64 bit folks:
Google does this because they have a large investment in 32 bit machines and they want to use those computers. The secondary database is a 64 bit design using recently purchased machines that are more expensive and computationally more powerful. However, they do not have enough of these machines to support the sheer number of "common" queries they receive.

So how much are these machines? The 2 guys just got 64 million for the IPO. It would seem they could spring for the hardware.

renee

9:45 pm on Oct 4, 2004 (gmt 0)

>>Ok for all of the people who say Google has reached capacity, how is it that you can add a new page to an "old" website and rank right away?

let me hazard a guess. at the time g ran out of capacity, it's solution was to create the supplemental index. it came to the point that just too many pages are being added particularly by new sites that just it became unreasonable to just shove pages into the supplemental claiming they qualify as "weird" queries as GG claimed. so google had to create another solution - a new index where it can quarantine new sites/pages.

since old sites remain in the main index, all new pages added remain in the main index and therefore participate in the pagerank algorithm and are able to rank. however, note that pages of old sites continue to disappear to make room for these new pages from old sites. that's the reason why google has not updated the "�2004 Google - Searching 4,285,199,774 web pages" which obviously applies to the main index. so the main index continues to be out of capacity.

i have a fairly large group of sites and i've been adding significant number of pages. however, i've noticed that my total number of pages in the main index (excluding supplementals) is not increasing at the same rate as new pages being added. I don't believe google limits the number of pages by domain. it's just that my group of sites are exhibiting the law of averages.

BeeDeeDubbleU

9:45 pm on Oct 4, 2004 (gmt 0)

The facts that we do know for sure ...

Fact 1. New sites get indexed within a day or two.

Fact 2. New pages on existing sites get indexed the same way (and get found.)

Think about it. There is no real evidence to suggest that this is a capacity problem. This is surely not why it exists.

Now is it a Google defect? That's another story ...

renee

10:01 pm on Oct 4, 2004 (gmt 0)

>>Fact 1. New sites get indexed within a day or two.

YES. they go to the sandbox index (or database).

>>Fact 2. New pages on existing sites get indexed the same way (and get found.)

YES. they go to the main index (or database) that's why they participate in the pagerank calculation and are able to rank in the serps!

this is a solution to the capacity problem in the same way that the supplemental index was created as a solution to the same problem. see my post above.

BillyS

10:05 pm on Oct 4, 2004 (gmt 0)

So how much are these machines? The 2 guys just got 64 million for the IPO. It would seem they could spring for the hardware.

The point is not how much the new machines will cost. The point is that they do not have sufficient reason to abandon the "old" 32 bit machines.

cabbie

10:45 pm on Oct 4, 2004 (gmt 0)

Really nice theories Billys and Renee.
I have no clue whether they are right or not but you have baffled me with enough science to make it sound plausible.

leveldisc

10:56 pm on Oct 4, 2004 (gmt 0)

OK Renee.

In your model

1. How come I get a new site A to rank above old site B for some searches, but it's the other way round for other searches?

2. Why do new sites appear at the top of serps for the allin commands?

3. Why did my PR get updated in April for a sandboxed site?

4. Why do sites in the sandbox index appear in the link:www.oldsite.com from the main index

and so on.

Marcia

11:09 pm on Oct 4, 2004 (gmt 0)

>>4. Why do sites in the sandbox index appear in the link:www.oldsite.com from the main index

Exactly. A link from a domain not even registered until July 1, 2004 shows up for link:www.mysite.com

5. Why are sites registered long ago, indexed and ranking for well over a year, now exhibiting some of the the identical symptoms as the sandboxed sites, except that their PR shows because of having been in the index prior to the TBPR lag?

What is the common denominator (or denominators) between the sandbox and Florida?

This 354 message thread spans 36 pages: 354