Forum Moderators: open
I feel very strongly that until we have a good grasp on why it exists, it will be very hard to beat.
I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing. Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.
So, why does the sandbox exist?
The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?
Nice theory Billys but why update the smaller index less than the big one?
Updated less frequently to stabilize the results and conserve resources. Common queries should have been answered many times over, no need to rush in with new answers.
It also makes for a better user experience. They type in a common two word query and get virtually the same results a month later. The end user appreciates this because it allows them to find things again. This raises confidence in the results thereby creating loyalty to Google.
Here's more for you 32 versus 64 bit folks:
Google does this because they have a large investment in 32 bit machines and they want to use those computers. The secondary database is a 64 bit design using recently purchased machines that are more expensive and computationally more powerful. However, they do not have enough of these machines to support the sheer number of "common" queries they receive.
So how much are these machines? The 2 guys just got 64 million for the IPO. It would seem they could spring for the hardware.
let me hazard a guess. at the time g ran out of capacity, it's solution was to create the supplemental index. it came to the point that just too many pages are being added particularly by new sites that just it became unreasonable to just shove pages into the supplemental claiming they qualify as "weird" queries as GG claimed. so google had to create another solution - a new index where it can quarantine new sites/pages.
since old sites remain in the main index, all new pages added remain in the main index and therefore participate in the pagerank algorithm and are able to rank. however, note that pages of old sites continue to disappear to make room for these new pages from old sites. that's the reason why google has not updated the "©2004 Google - Searching 4,285,199,774 web pages" which obviously applies to the main index. so the main index continues to be out of capacity.
i have a fairly large group of sites and i've been adding significant number of pages. however, i've noticed that my total number of pages in the main index (excluding supplementals) is not increasing at the same rate as new pages being added. I don't believe google limits the number of pages by domain. it's just that my group of sites are exhibiting the law of averages.
Fact 1. New sites get indexed within a day or two.
Fact 2. New pages on existing sites get indexed the same way (and get found.)
Think about it. There is no real evidence to suggest that this is a capacity problem. This is surely not why it exists.
Now is it a Google defect? That's another story ...
YES. they go to the sandbox index (or database).
>>Fact 2. New pages on existing sites get indexed the same way (and get found.)
YES. they go to the main index (or database) that's why they participate in the pagerank calculation and are able to rank in the serps!
this is a solution to the capacity problem in the same way that the supplemental index was created as a solution to the same problem. see my post above.
In your model
1. How come I get a new site A to rank above old site B for some searches, but it's the other way round for other searches?
2. Why do new sites appear at the top of serps for the allin commands?
3. Why did my PR get updated in April for a sandboxed site?
4. Why do sites in the sandbox index appear in the link:www.oldsite.com from the main index
and so on.
Exactly. A link from a domain not even registered until July 1, 2004 shows up for link:www.mysite.com
5. Why are sites registered long ago, indexed and ranking for well over a year, now exhibiting some of the the identical symptoms as the sandboxed sites, except that their PR shows because of having been in the index prior to the TBPR lag?
What is the common denominator (or denominators) between the sandbox and Florida?