Forum Moderators: open
I feel very strongly that until we have a good grasp on why it exists, it will be very hard to beat.
I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing. Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.
So, why does the sandbox exist?
The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?
I started a couple of commercial sites (ecommerce) for people back in spring, and they hardly rank at all on Google, but also I bought a new domain last year and installed oscommerce but never developed it apart from an index page. It was spidered and got a pagerank of 2. Three months ago I added content but Google will still not cache or index any pages beyond the index page.
It's almost as if it has been put in the sandbox even though it is an 'established' site!
I'm sure there are plenty of 'what if' scenarios and I've experienced sites that were very similarly launched and one got in and one didn't.
It seems there is no rhyme or reason, but something like a sandbox in theory is a good idea...especially when google considers spam one of their main obstacles, strong enough to mention multiple times in their s1 filing and spam taxonomy paper.
Google's main index cannot take in any more domains. So Google's solution is to create a new, separate index (similar to the supplemental index) for new domains/sites. Sure behaves like the supplemental. yields serps to site: queries as well as queries with small result set from the main index.
the only way google migrates domains/sites to the main index if google removes complete, old sites (my site is missing!) from the main index. the question then is what algorith google uses to remove old sites from the main index and selects new sites from the "sandbox" index.
from what i see, the algorithm appears to be nothing but pure random chance.
by the way, have you noticed that the total number of pages in the main index has not changed in more than a year?
controversial theory but sure explains all the symptoms we see.
1)Lets say google knew they were going to IPO this year so they needed a way to keep things stable and not mess that up. So they instituted the 'the sandbox' earlier this year to keep things stable during the IPO. Now that the IPO is over and 'googlebot is running in panic mode' [webmasterworld.com] they are rebuilding the index and possibly addressing some problems like page jacking [webmasterworld.com].
2)The sponsored text link business is hard to combat without collateral damage. So they now force links to go thru a probationary period first. It affects new sites the hardest since ALL of the links are going thru at the same time.
The only thing I can say with certainty is site wide links will push you into the sandbox if you're on the edge. I had one fairly new site that was ranking incredibly poorly but still ranking. A week or so after it got a site wide link (100+pages) it was banished. If I test the using the "allin" commands right now I'm #2-4.
The problem is we've only seen stuff move out once (early may). With only one instance to study, its fairly dificult to determine a pattern of behavior.
Here is a thought! What if google stopped PR updates and some of the other strange things to see what the SEO community would do, specifically to target the spamming/linking/hijacking issues. You know as well as I do that those who are out looking for page rank in the most devious ways are scrambling right now. So they then send out new bots on the new IP address range that they registered in march to the top (N) problem sites, somewhat under the radar to see what what changes have been made to those sites. They then compare the old and the new index. That would give them a pretty good idea of the scope antics used. Then one by one start dropping them.