Nailing down the "sandbox"

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Nailing down the "sandbox"

How deep is the sand? Who has to play there?

suidas

10:51 pm on Jan 17, 2005 (gmt 0)

I've seen a lot of messages about the sandbox, but none of them are clear about how major the effect is. Recently someone responded to a why-isn't-my-site-number-one request with:

If your site is less than a year old you are likely sandboxed.

I can't believe most sites under a year's age are in some sort of penalty box. Google would be useless. So, I want to know:

1. Are all sites sandboxed, or do certain traits (like affiliate links, low content) trigger it?
2. How long does it last?
3. How variable is the duration?
4. How do you know your site is being sandboxed?
5. Does the effect taper off or is it a binary thing?
6. What gets you out of the sandbox? Is it merely time or do good links or whatever speed it up?

Thanks.

2by4

9:58 pm on Jan 26, 2005 (gmt 0)

xcomm, I shouldn't have used the word 'all', that's always a bad idea. Sorry.

New linux kernel, what is it, every 2 years? Mozilla gecko engine, maybe new one every 18 months, somewhere around there I think.

Google has to handle millions of queries a day, what is it, like 200 million?

1. real world business, requirement to assemble the largest possible war chest to fend off ms attacks. Failure to do this means failure to survive longterm. Google doesn't have a second chance at an IPO, they did what they needed to do. I don't criticize them for this, they are going to need every penny once msn takes serious aim at them.

1a. Keep share prices high as major google shareholders begin to sell off massively overvalued shares - and no, this isn't a conspiracy, they have filed to sell off major blocks of shares, and the higher share prices stay, the more cash those shareholders will collect. Forget the fantasies, put yourself in their shoes, what would you do? Sell now and make maybe five times more money, as long as you can keep the quarterly results at record levels, which they are, or let the shares deflate to a more natural level?

2. practical hardware restrictions that have only been resolved this last year.

3. 2.6 kernel only now becoming standardized, redhat has just released their enterprise server with the 2.6 kernel, google was using redhat, don't know if they still are.

4. huge waves of autogenerated spam

5. full index. Remember, it was only in november that the pages indexed count rose above 2^32, and then it rose to simply 2x2^32, which suggests very strongly that google is in fact still using the same 2^32 primary algo. And it hasn't fixed this. Although apparently they have managed to fool enough of the people enough of the time to keep this less discussed than it should be.

6. So where is the foundation for the conclusion that they have fixed it? I don't see it, all i see is a growing pile of evidence that they haven't fixed it. That's if you take the above 5 factors into account, and don't skip the ones you don't like, but still aren't able to give a satisfactory explanation for.

I have no doubt that duplicate content may very well be treated in a similar fashion that sandboxed pages are, in other words, they are dumped into the secondary index, and are not accessed during normal searches.

Elixir

11:04 pm on Jan 26, 2005 (gmt 0)

For anybody who was up late on Saturday night about three weeks ago the best evidence I have seen of a second index was when the sandboxed sites appeared in the serps. The type of rankings you would expect for a good site, good content and good links which was 9 months old. Without exception all our sandboxed sites were in the serps. I could not believe it and could not wait until Monday morning to tell my patient clients the wait was over. On Sunday afternoon gone, completely all the results back to exactly where they were and the sandboxed sites had all dissapeared again. There have been a few isolated incidents of sandboxed sites appearing and then going away but this was across the board. Any Webmaster up as late as me posted about the phenomena and on Sunday they had the same story to tell. I am not technical enough to even begin to figure out why there are two indexes but I am totally convinced there are.

JudgeJeffries

1:45 am on Jan 27, 2005 (gmt 0)

How about this scenario.
All the sites that are sandboxed need to get their visitors and resort to PPC. The sandbox is then abandoned for a while. The floodgates open and the myriad of new sites knock a lot of older established sites from the top spots and they then, having got used to the volume of traffic, find they have to go PPC to maintain their income. The sandbox is thereafter reset and the process starts again with Google having a whole host of new paying customers and lots more in the pipeline.

dazzlindonna

2:50 am on Jan 27, 2005 (gmt 0)

Judge, that's a scary thought, although I can see how it would be very profitable for Google. If they hadn't thought of it before, I hope your post didn't give them a new idea. :)

pr0purgatory

3:58 am on Jan 27, 2005 (gmt 0)

Not sure if I may post URL's here but I found this article quite informative about the Google sandbox:

[#*$!.com...]

My site's been sandboxed for months and months now...(about 9 months - blah)

To give an idea, on MSN my site ranks at #6 for a search phrase that returns 4.3 million results.

On Yahoo my site ranks at #11 for same phrase that returns 4.1 million results.

On Google I'm not even in the top 1000, but using the 13 nonsense words technique the site is ranking #23 for the same search phrase that the site ranks for in MSN and Yahoo.

I've resorted to niche phrases in the mean time, 270 of them with a page for each phrase. I'm waiting for those pages to get indexed now and I'll see if this improves things.

Powdork

4:20 am on Jan 27, 2005 (gmt 0)

I've resorted to niche phrases in the mean time, 270 of them with a page for each phrase. I'm waiting for those pages to get indexed now and I'll see if this improves things.

At least you'll rank well on Yahoo and MSN for those too.

pr0purgatory

4:43 am on Jan 27, 2005 (gmt 0)

Powdork, that would be a plus :)

My noob theory is, sandboxing starts with the search phrase - sandboxed sites are prevented from being beng visible for certain phrases, often popular ones.

That doesn't make a sandboxed site invisible for other phrases. I've proved this by finding my site in searches using "nicher" (is that a word?) phrases.

squared

6:41 am on Jan 27, 2005 (gmt 0)

Oh sh*t.

Even MSN has one now: [sandbox.msn.com...]

soapystar

8:47 am on Jan 27, 2005 (gmt 0)

for n00b theories howabout.....you make a decision that applying agrresive filters to the whole index is just leading to an unstable index...deciding that from a defined point only new sites will undergo the heaviest filters..this would make the fresh part of the index as clean as possible with automation whilst not affecting the stability of the index as a whole as seen with previuos large scale algo changes...so the index as we see it now will always form the basis of the serps while new sites being added will be as free of spam as possible....either they are currently happy with the trickle of sites making it through or those new heavy filters havent been fine tuned enough to their liking yet....

claus

10:40 am on Jan 27, 2005 (gmt 0)

Sooner or later the penny will drop. Algorithms can be used to cope with the run of the mill stuff but if quality results are required manual intervention is a necessity and that is a an undeniable fact.

Nice one BeeDeeDubbleU - i doubt we will see that soon (if ever) but i do agree :)

Google has to handle millions of queries a day, what is it, like 200 million?

(...) are not accessed during normal searches.

Nice to see that we're getting closer to the core matter, finally. I was really starting to wonder if this particular penny would ever drop.

Do read the posts by hasm (msg# 170) and AcsCh (msg #176). Specifically, i find these two bits from AcsCh interesting and highly probable:

-adsfdsa -asdfdsa tricks the filter as google does not have a subset for every of this results and has to serve "simple" dynamic results, before filtering.
(...)
what might be some fairly huge algorithm not usable for on the fly results out of a DB with 8'00..... Pages..

Keep in mind that the Google infrastructure is so much larger than you think it is, and that there's already one layer between this and the users (ie. the akamai servers). It's perfectly safe to assume that there are more than one layer of crawling/indexing/scoring/ranking as well, and indeed that some parts of some of these layers are less "on the fly" than others.

As for dual indexes, that's really irrelevant - fwiw, Google can have 2 indexes [webmasterworld.com], 10 indexes, or 100 indexes without affecting ranking, just as well as they can have it and affect ranking. One thing is how you store your data, another is what you do to it. If you're faced with constraints, you can either do the same thing differently or do something else - and of course these options apply as well, even if you're not faced with constraints. (the definition of "constraints" might very likely be another to a Google engineer than to a WebmasterWorld poster, btw)

As for calculation restraints, there's a fine post by Marcia in the supporters forum [webmasterworld.com] describing that you can just do your math on domains rather than on pages. That would be an example of "doing the same thing differently".

Btw, does anyone remember a post i made on 2003-07-08 titled "Google's two rankings and you"? We had a fine and long discussion back then, although nobody (me included) could foresee "the sandbox" - or rather, that it would become so big an issue. That would be an example of "doing something else".

Oh, and for those that follow my hint, if it has any substance things like this [webmasterworld.com], this [webmasterworld.com], this [webmasterworld.com], and this [webmasterworld.com] might even make "the sandbox" worse eventually.

nutsandbolts

11:35 am on Jan 27, 2005 (gmt 0)

I thought I would ask Google directly about my one stubborn sandbox site (it's been stuck in the glue since Jan 04) and after 3-4 automated replies I finally got a human telling me that you can see the site if you search with inurl:! Blah, really Google? ;) What normal person does that when searching? :)

I suggest all sandbox websites get together and publicise the need to add a load of -asdf to every search result to get the REALLY good sites :)

xcomm

11:37 am on Jan 27, 2005 (gmt 0)

I would like go back to =< 13x -nonsens and what it really means. As to my understanding this seem the one thing showing there is a sandbox.

When you go "widgets -asdf -asdf -asdf -asdf -asdf -asdf, etc." you are telling it that you want something with widgets in it, but you REALLY want somethig without asdf. As a result, you get something with the keyword factor "watered down."

I do not see where -asdf can take the weight from the keyword as this -asdf is NOT in anywhere (sites/link text).

So I search for one term and are getting really the same count of results from Google, despite I see new authority sites jump up near the old ones (e.g. from #177 to #4).

I really curious about what this is triggering within their SERP.

RoySpencer

11:52 am on Jan 27, 2005 (gmt 0)

When you go "widgets -asdf -asdf -asdf -asdf -asdf -asdf, etc." you are telling it that you want something with widgets in it, but you REALLY want somethig without asdf. As a result, you get something with the keyword factor "watered down."

please clarify: was the point of this statement that exclusion of the obscure word "asdf" pushed the search into the secondary (sandboxed) index? Otherwise, I don't see how the exclusion favors one site over another.

brixton

11:53 am on Jan 27, 2005 (gmt 0)

I have seen some spamers that they made it to the top within weeks with brand new domains and new sites in very very competitive areas, i'm sure you wont find them here posting, I crash my head to figure out how they did it, by the way they have a large crosslinking (spam) web factory network and all there domains are throw away key word domains.

xcomm

12:19 pm on Jan 27, 2005 (gmt 0)

RoySpencer,

that exclusion of the obscure word "asdf"

more them 13x times seems to get you rid of the SandBox filter applied to most SERPs otherwise.

larryhatch

12:23 pm on Jan 27, 2005 (gmt 0)

Suppose somebody wedged in the word -asdf, complete with the minus sign, into the text of their pages?
Is that risky? -Larry

pr0purgatory

12:29 pm on Jan 27, 2005 (gmt 0)

Larry,

How would that be risky? You'd still need people to search for your phrase words -asdf -asdf -asdf etc

Imaster

12:58 pm on Jan 27, 2005 (gmt 0)

In msg #:163, hasm wrote:

The capacity problem and dual index idea is intriguing. But there is obviously some sort of intermingling of pre and post-sandbox sites within the serps. There doesn't appear to be a clear line in the serps that suggests a dual index between old and new.
First, a new site can outrank some of the older sites (and even rank well for noncompetitive keywords), but is just significantly suppressed from its "rightful" position. Second, the use of the -fsdfdf garbage strings seems to solve the problem, and display old and new sites in their "rightful" place.
If there is a capacity problem/dual index, can anyone suggest how these two observations are consistent with it. And how the google algorithm might be choosing to intermingle the old index and the new index?

That's an interesting question, anybody has any comments on this issue?

xcomm

1:13 pm on Jan 27, 2005 (gmt 0)

Imaster,

I would assume sandboxed sites ranking at SERPs arround e.g. #170 are not in the seconday index. What do you think about it? I would say all under #1000 is in the 1x32 index?

For me it looks as SandBox is applied afterwards at search time.

Just Guessing

2:10 pm on Jan 27, 2005 (gmt 0)

hasm, AcsCh and Claus make some excellent points.

2 indexes or 2 algos?

Well, Google would be crazy not to cache common searches, and if you are caching, you can certainly make the algo much more complicated.

Does it matter?

Yes if it's 2 algos, then the Sandbox algo is the one they really want to use!

jaffstar

2:24 pm on Jan 27, 2005 (gmt 0)

I would assume sandboxed sites ranking at SERPs arround e.g. #170 are not in the seconday index. What do you think about it? I would say all under #1000 is in the 1x32 index?

Most people are around here who claim they are sanboxed/filtered cannot enter the top 1000, so sites within the top1000 should be in the 1x32 index.

I popped in at #200 today, I have been months since this happen. Maybe I am out? I am not celebrating yet, as I could vanish tomorrow.

RichTC

6:42 pm on Jan 27, 2005 (gmt 0)

Hi all,

Its just been pointed out to me the effect that this sandbox is having on our website. Our website is very rich in content and has significant anchor text links to it (over 3000) yet does not feature in the SERPS despite everything being done by the book. Loads of our pages are PR5s.

Our best position in Google is at 380ish for the search term "Insurance Jobs". When you use the 13 -adsls etc we move to position 7 which is where i would expect it to be. Other dedicated pages of our site that are outside the top 500 also come into play with the filter and rank in the top 20.

I guess what frustrates us more than anything is the fact that at position 380 the site lists after sites about being Gay, single, having life assurance, having cheap motor insurance, and various other sites pagerank 0 that have absolutely nothing to do with the search term.

Which ever way you look at it, our site is firmly in the google bin, i have no idea why or any idea how long it would be before at least one page is released from this nightmare!

RoySpencer

7:06 pm on Jan 27, 2005 (gmt 0)

Our pages are all sandboxed, but many of them rank in the top 1,000, some even in the top 100.

The important thing is that their "rightful" (old)position seems to be the current position divided by something like 20...so a page now at position 400 used to be at 20, a page now at position 40 used to be around position 2. This is just a rough example...even my mileage varies widely.
Oh, and some don't appear in the top 1,000, even though they are in the index.

steveb

7:27 pm on Jan 27, 2005 (gmt 0)

"I crash my head to figure out how they did it"

It's been posted here many times. Create 1000 pieces of crap; spam blogs on different IPs; get no high quality links; see four or five avoid the sandbox, temporarily; do it again. It's the bottom of the slag heap, but it "works" to this degree.

OptiRex

7:51 pm on Jan 27, 2005 (gmt 0)

Our best position in Google is at 380ish for the search term "Insurance Jobs"

But look how many results there are, 19 million on .com, 4.75 million on .co.uk, even if you use "" you still have 799,000.

The spam problem is all these businesses are obviously making far too much money and they want to fleece us even more! Why else would they do it?

The problem starts at the top, and bearing in mind many of these companies are owned by the banks, they are the ones who are proliferating all this garbage just so that the head honchos can earn another few million a year, and screw you Mr/Mrs Employee who we'll discard just to make the bottom line look good for our shareholders.

If this is what the sandbox is stopping...wonderful!

docbird

10:17 pm on Jan 27, 2005 (gmt 0)

The floor is all yours, Googleguy!

BennyBlanco

11:20 pm on Jan 27, 2005 (gmt 0)

Sorry Roy, but it sure doesn't sound like you are playing in the sand. If any of your site's pages are coming in within the top 100, than it would seem to me that your problem does not lie in the 'box' but rather from other things that your site may not be doing along with a G algo shift. I have to agree with the others, 'sandboxed' sites, well, they are simply not in the top 100 unless we are talking about a query that only has about 10k results overall displayed.

GoogleGuy took the floor on WebmasterRadio last night and dropped some hints for a keen listener. (Sorry Guru if you get flooded with replay requests now).

blaketar

12:30 am on Jan 28, 2005 (gmt 0)

Two Questions.

First, what is WebMasterRadio and where might one find this?

Secondly, typical SBoxed site. Why when I place all those -asdf in my keyword search do I now come up number 3? Without them I am nowhere out of 18,000,000

Thanks all for the ray-of-light!

antoine

3:52 am on Jan 28, 2005 (gmt 0)

'I have seen some spamers that they made it to the top within weeks with brand new domains and new sites in very very competitive areas, i'm sure you wont find them here posting, I crash my head to figure out how they did it, by the way they have a large crosslinking (spam) web factory network and all there domains are throw away key word domains.
'

This is simple. These are links from blog spam. Google does not penalize one way spam links.

nuevojefe

4:00 am on Jan 28, 2005 (gmt 0)

Without them I am nowhere out of 18,000,000

Probably not true, you're just not in the top 1,000.

And to answer your -asdf question, read this whole thread and you'll see there's no definitive answer yet. (surprise, surprise)

This 367 message thread spans 13 pages: 367