Forum Moderators: open
I feel very strongly that until we have a good grasp on why it exists, it will be very hard to beat.
I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing. Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.
So, why does the sandbox exist?
The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?
This one slipped by me. They haven't thwarted SEOs at all, all they'v done is slow down the process by which they admit most, apparently not all if the posters are telling the truth, sites. Most is good enough. However, this isn't just thwarting the seos, it's thwarting trivial things like upto date results, getting the latest new sites freshly served to the user's screens. This isn't a good thing. There is absolutely no way anyone can get me to believe that this lag is a deliberately planned method to cut down on SEOs manipulation, that's makes no sense at all, it's like saying a newspaper will only print new news if it's about older news stories it's been following; all other news items you'll have to wait 6-8 months to read. The web isn't static, it moves, that's what it is. Freezing sites for 6 months, even if you can avoid the freeze if you know how, only helps the SEOs who know how to avoid it, assuming they do.
This is not a business plan I'd invest in, and if this is google's current plan, then I'd start selling my stock as fast as possible. MSN must be amused. Their spiders have no trouble indexing large sites quickly, neither does slurp, but googlebot just limps along, as if it had to wait for something before adding more pages.
And this is not the business plan that made google a success, it reminds me much more of what made altavista fail.
They have not beat the sandbox!
"I've never seen it, so it must not be happening", right?
but googlebot just limps along, as if it had to wait for something before adding more pages.
No.... for the 67,000 time, this is a ranking issue, not an indexing issue. Please re-read the thread and, ideally, test the damned thing yourself. It's apparent that there are many people commenting in this thread that don't even have sites in the sandbox, or are being generally ignorant because it's trendy.
This knocked out the single largest short term threat to G's future quality...not a small thing with an IPO and attendant scrutiny on the horizon.
3 months ago I'd believe you. Now, the IPO seems like a cozy wave-away excuse much like the "it reduces spam" line.
capacity issue
I hate to say it, but it's possible. I'd believe "capacity issues" before "spam fighting".
But, if capacity issues are the real reason, I seriously doubt G would take 8 months to fix it. Capacity issues would be, I would consider, a major "drop everything now" type thing. Also, they would see something like that coming - the growth of the web is fairly linear.
founders' and management comments on info versus commercial sites;
Naw, caveman. The Google "nice guy" line worked two years ago. I don't believe it anymore. There are a lot of good people working at Vendor G now, but let's face it; the minute they went public, their management ceased to be a bunch of guys concerned with changing the world. The "new" management is the American economy, and the American economy demands profits.
"News" still appears in a timely fashion.
"News" typically doesn't appear on a new domain.
Google's trying to do something with new domains that want to rank in competitive queries. The question is what in the hell takes 6-8 months to do?
If you insist on trying to maintain that possibly related phenomena are unrelated by definition, you'll never get your question answered in most likelihood since you just might be excluding the answer in the process. But at least finally there is the barest consideration that there just might be a capacity problem, took a year, better late than never I guess.
We are not looking at something that is performing in the way it was designed to perform. It is not indexing large sites quickly, or, often, completely, it is not listing fresh new content in a timely manner when it comes to that content being on new sites. This is the foundation on which google built itself, it is the direct cause of their success. If google was your operating system you'd be switching to Linux or Macs right around now.
If SEO types managed to force them off this central goal, then there was something intrinsically flawed with google's methodology, and I would guess that in fact there is. If spam sites flooded the index, this added far more pages than they would have anticipated. So projections may have been unable to anticipate this type of growth. Remember, it was in 2000 that they were at 1 billion pages indexed, more or less, and that was the whole web. Why should they project a 4x growth in 3 years?
SEO spammers are very much like virus writers, taking advantage of weaknesses and loopholes in existing programming to do what they want. Some like to blame the hackers, but the hackers point to the weaknesses and say if they weren't there there would be nothing to exploit. I tend to agree with this latter view.
Google is making more money now than it was however, so if they are smart they can deal with this by throwing more resources at it.
[edited by: isitreal at 12:08 am (utc) on Oct. 6, 2004]
If you insist on trying to maintain that possibly related phenomena are unrelated by definition
Sites in the "sandbox" are included in the index. Googlebot comes by, indexes the page, and then they appear when Google is queried for the page name.
Are you seeing something else?
It is not indexing large sites quickly
New site in sandbox - 60K pages in the Google index within 1 week.
Another new site in sandbox - 120K pages - this one took somewhere between 6-10 days, don't have an exact day since my reporting was down for a couple of days.
Are you seeing something else?
Yes. I am seeing extremely erratic behavior. I'm seeing very very slow crawls through large sites. I'm seeing several weeks to index large amounts of new content on old sites. I'm seeing a sandboxed site return those supplemental type results for site: type searches. I'm seeing both the msn bot and slurp eat up sites at maybe 3-5 times the speed of googlebot. I'm seeing a site that has been gone for months still return pages that have been gone for 10 months when you do a site: search for it. What I am not seeing is the google I used to see.
These have been constant topics here on google news forums, I'm not the only person to have seen this. Site's partially indexed, blah blah, for the last 8 months there have been almost nothing but threads like that here.
bakedjake, try looking at google exactly the way you look at windows, I suspect that perspective change might do the trick.
I'm seeing very very slow crawls through large sites. I'm seeing several weeks to index large amounts of new content on old sites.
Google has changed in that you can't get 50K pages indexed on one PR8 link anymore. But that's not "sandbox".
I'm seeing a sandboxed site return those supplemental type results for site: type searches.
More related to the dupe filter than not, I'd bet. I see the same thing with sites that are caught by the dupe filter, then shuffled off to supplemental.
This can also happen for the same reasons above - when you have a 100K page site with like 1 incoming link or something.
Google is smart enough to know that you don't need 100K pages in main if you've only got one incoming link. Or even 100 incoming links.
I'm seeing a site that has been gone for months still return pages that have been gone for 10 months when you do a site: search for it.
Okay, that's weird. :) But I doubt it's related to the "sandbox".
We have a lingo problem, let's get on the same page:
Define: sandbox - The "sandbox", as I am referring to, is the phenomenon of the lag that a new domain experiences when ranking for money terms, while full indexing still takes place.
try looking at google exactly the way you look at windows
How will looking at an operating system change my perspective on a search engine?