Forum Moderators: open
I feel very strongly that until we have a good grasp on why it exists, it will be very hard to beat.
I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing. Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.
So, why does the sandbox exist?
The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?
Because Google if it was a capacity issue should want to omit the pages most likely to be problematic from the index. If they did what you suggest above, this would mean new pages on established news sites wouldn't be indexed because they will start out at PR0 when they are found. This would make the index look really stale. A spammer can easily get his site above PR0 with a few decent PR links from other sites he controls. The spammer modus operandi is to keep tossing up new domains, and as the older ones are zapped, the new ones get take their place. Thus to deal with these spammers, Google decided that the worst/weakest sites likely were the newest ones.
The same thing seems to work but a lot slower with a pr5 site also it looks to me that May-start June links are just beginning to be factored in... with lower pr5 and pr4 sites it's all quiet yet...
Which leads to thinking - can this be that time of lag is inversely proportional to your absolute PR in any way? Also it does look like pre-sandbox links or links that stayed unchanged at the same urls for over a certain time are treated a lot better (this was mentioned quite a few times here before) and their degree of 'reputability' and being factored in is proportional in a variable way to the PR of pages they lead to. From what I saw this also might depend on multiple other factors like pr of the page linking to you, their domain's 'reputability', surrounding text relevance etc etc etc.
Anyone had experience with older established sites creeping out of sandbox by the similar means?
V.
There has been a lot of speculation, and some evidence, that it isn't new sites that are sandboxed, but in fact new inbound links. If you think about it, in the case of genuinely new domains, these will just have newer links, as people don't link to domains that aren't even registered yet. (And presumably Google would just ignore links to non-existant domains.) Thus, if it is new links that are sandboxed, to a casual observer it might appear it is based on the newness of the domain.
Your experience would be consistent with it being new inbound links are what are sandboxed. Because you did have some really old links, your site never was fully sandboxed. By tweaking those links plus on page content you were able to get your site to rank well. PR6 is pretty solid. With old links good enough to get this site to PR6, tweaking both the links and the on page content could ne enough to get decent rankings.
Whatever the sandbox is, the theory of starting out a new domain with minimal content and some solid inbound links has appeal. Don't do this, and when real content is added to the site, it won't go anywhere because of the sandbox. However, just let the site age until it is "ripe", and then slap up the content and tweak the links, it will soon rank well.
However, just let the site age until it is "ripe", and then slap up the content and tweak the links, it will soon rank well.
It would depend on your definition of "soon". One site that I created for a small consultancy has lots of original and interesting content. It is as good a resource as any on the subject matter but it is nowhere to be seen after six months.
I recall ftping some newly created single pages, not whole sites and indeed that must have been Mid June or so. They did get PR, while all pages created since then did not (the index.html page is within google's for four years now).I can't tell about Feb-Apr because I was off then, but if you say the sandbox phenomenon dates back to march then, yes, there must have been a short exceptional period.
Oliver
Yah, sometime in May, sites that were being boxed got loose which led many of us to believe there was simply a 2-3 month holding period. Since then it is as if time has stood still!
One quick point. Many here keep stating that G is fighting commercial spam here. Besides the obvious fact that this measure has ZERO impact on the existing serps and the really professional SEO/spammers have found ways around it anyway, countless non-profit sites are in the box as well...
* Google had 1B pages indexed in 2000 and 4B last year, that's a growth rate of about 2^(1/2) per year. The number of indexes will double every two years, perhaps the moore's law of indexable information.
From the latests posts I have read, there generally seems to be a consensus that the sandbox started sometime early this year, maybe about February, March. There was also only one period, about May (or June) where all sites were allowed in. If others do not see it this way, please let us know what you think on this and why.
If all sites were allowed in, this would appear to have little to do with spam fighting.
Perhaps what we are seeing is something like this. Google has an index that got full. They have a second index that newly crawled sites go into since the first index is full. Their search calculations cannot work across both indexes. At certain intervals, they allow sites from the second index "into" the first index thereby making those sites available to appear (in earnest) in their search results.
Obviously this is just a guess. I don't have reasons to give as to why their search calculations cannot work across both indexes and how they let sites from the second index into the first. I know it has been proposed that sites in the first index could be removed if pages were no longer available or if they were found to be of low quality but I am not sure if I buy into that theory.