Forum Moderators: Robert Charlton & goodroi
If your site is less than a year old you are likely sandboxed.
I can't believe most sites under a year's age are in some sort of penalty box. Google would be useless. So, I want to know:
1. Are all sites sandboxed, or do certain traits (like affiliate links, low content) trigger it?
2. How long does it last?
3. How variable is the duration?
4. How do you know your site is being sandboxed?
5. Does the effect taper off or is it a binary thing?
6. What gets you out of the sandbox? Is it merely time or do good links or whatever speed it up?
Thanks.
New linux kernel, what is it, every 2 years? Mozilla gecko engine, maybe new one every 18 months, somewhere around there I think.
Google has to handle millions of queries a day, what is it, like 200 million?
1. real world business, requirement to assemble the largest possible war chest to fend off ms attacks. Failure to do this means failure to survive longterm. Google doesn't have a second chance at an IPO, they did what they needed to do. I don't criticize them for this, they are going to need every penny once msn takes serious aim at them.
1a. Keep share prices high as major google shareholders begin to sell off massively overvalued shares - and no, this isn't a conspiracy, they have filed to sell off major blocks of shares, and the higher share prices stay, the more cash those shareholders will collect. Forget the fantasies, put yourself in their shoes, what would you do? Sell now and make maybe five times more money, as long as you can keep the quarterly results at record levels, which they are, or let the shares deflate to a more natural level?
2. practical hardware restrictions that have only been resolved this last year.
3. 2.6 kernel only now becoming standardized, redhat has just released their enterprise server with the 2.6 kernel, google was using redhat, don't know if they still are.
4. huge waves of autogenerated spam
5. full index. Remember, it was only in november that the pages indexed count rose above 2^32, and then it rose to simply 2x2^32, which suggests very strongly that google is in fact still using the same 2^32 primary algo. And it hasn't fixed this. Although apparently they have managed to fool enough of the people enough of the time to keep this less discussed than it should be.
6. So where is the foundation for the conclusion that they have fixed it? I don't see it, all i see is a growing pile of evidence that they haven't fixed it. That's if you take the above 5 factors into account, and don't skip the ones you don't like, but still aren't able to give a satisfactory explanation for.
I have no doubt that duplicate content may very well be treated in a similar fashion that sandboxed pages are, in other words, they are dumped into the secondary index, and are not accessed during normal searches.
[#*$!.com...]
My site's been sandboxed for months and months now...(about 9 months - blah)
To give an idea, on MSN my site ranks at #6 for a search phrase that returns 4.3 million results.
On Yahoo my site ranks at #11 for same phrase that returns 4.1 million results.
On Google I'm not even in the top 1000, but using the 13 nonsense words technique the site is ranking #23 for the same search phrase that the site ranks for in MSN and Yahoo.
I've resorted to niche phrases in the mean time, 270 of them with a page for each phrase. I'm waiting for those pages to get indexed now and I'll see if this improves things.
My noob theory is, sandboxing starts with the search phrase - sandboxed sites are prevented from being beng visible for certain phrases, often popular ones.
That doesn't make a sandboxed site invisible for other phrases. I've proved this by finding my site in searches using "nicher" (is that a word?) phrases.
Even MSN has one now: [sandbox.msn.com...]
Sooner or later the penny will drop. Algorithms can be used to cope with the run of the mill stuff but if quality results are required manual intervention is a necessity and that is a an undeniable fact.
Nice one BeeDeeDubbleU - i doubt we will see that soon (if ever) but i do agree :)
Google has to handle millions of queries a day, what is it, like 200 million?
(...) are not accessed during normal searches.
Nice to see that we're getting closer to the core matter, finally. I was really starting to wonder if this particular penny would ever drop.
Do read the posts by hasm (msg# 170) and AcsCh (msg #176). Specifically, i find these two bits from AcsCh interesting and highly probable:
-adsfdsa -asdfdsa tricks the filter as google does not have a subset for every of this results and has to serve "simple" dynamic results, before filtering.
(...)
what might be some fairly huge algorithm not usable for on the fly results out of a DB with 8'00..... Pages..
Keep in mind that the Google infrastructure is so much larger than you think it is, and that there's already one layer between this and the users (ie. the akamai servers). It's perfectly safe to assume that there are more than one layer of crawling/indexing/scoring/ranking as well, and indeed that some parts of some of these layers are less "on the fly" than others.
As for dual indexes, that's really irrelevant - fwiw, Google can have 2 indexes [webmasterworld.com], 10 indexes, or 100 indexes without affecting ranking, just as well as they can have it and affect ranking. One thing is how you store your data, another is what you do to it. If you're faced with constraints, you can either do the same thing differently or do something else - and of course these options apply as well, even if you're not faced with constraints. (the definition of "constraints" might very likely be another to a Google engineer than to a WebmasterWorld poster, btw)
As for calculation restraints, there's a fine post by Marcia in the supporters forum [webmasterworld.com] describing that you can just do your math on domains rather than on pages. That would be an example of "doing the same thing differently".
Btw, does anyone remember a post i made on 2003-07-08 titled "Google's two rankings and you"? We had a fine and long discussion back then, although nobody (me included) could foresee "the sandbox" - or rather, that it would become so big an issue. That would be an example of "doing something else".
Oh, and for those that follow my hint, if it has any substance things like this [webmasterworld.com], this [webmasterworld.com], this [webmasterworld.com], and this [webmasterworld.com] might even make "the sandbox" worse eventually.
I suggest all sandbox websites get together and publicise the need to add a load of -asdf to every search result to get the REALLY good sites :)
When you go "widgets -asdf -asdf -asdf -asdf -asdf -asdf, etc." you are telling it that you want something with widgets in it, but you REALLY want somethig without asdf. As a result, you get something with the keyword factor "watered down."
I do not see where -asdf can take the weight from the keyword as this -asdf is NOT in anywhere (sites/link text).
So I search for one term and are getting really the same count of results from Google, despite I see new authority sites jump up near the old ones (e.g. from #177 to #4).
I really curious about what this is triggering within their SERP.
When you go "widgets -asdf -asdf -asdf -asdf -asdf -asdf, etc." you are telling it that you want something with widgets in it, but you REALLY want somethig without asdf. As a result, you get something with the keyword factor "watered down."
please clarify: was the point of this statement that exclusion of the obscure word "asdf" pushed the search into the secondary (sandboxed) index? Otherwise, I don't see how the exclusion favors one site over another.
The capacity problem and dual index idea is intriguing. But there is obviously some sort of intermingling of pre and post-sandbox sites within the serps. There doesn't appear to be a clear line in the serps that suggests a dual index between old and new.
First, a new site can outrank some of the older sites (and even rank well for noncompetitive keywords), but is just significantly suppressed from its "rightful" position. Second, the use of the -fsdfdf garbage strings seems to solve the problem, and display old and new sites in their "rightful" place.If there is a capacity problem/dual index, can anyone suggest how these two observations are consistent with it. And how the google algorithm might be choosing to intermingle the old index and the new index?
That's an interesting question, anybody has any comments on this issue?
2 indexes or 2 algos?
Well, Google would be crazy not to cache common searches, and if you are caching, you can certainly make the algo much more complicated.
Does it matter?
Yes if it's 2 algos, then the Sandbox algo is the one they really want to use!
I would assume sandboxed sites ranking at SERPs arround e.g. #170 are not in the seconday index. What do you think about it? I would say all under #1000 is in the 1x32 index?
Most people are around here who claim they are sanboxed/filtered cannot enter the top 1000, so sites within the top1000 should be in the 1x32 index.
I popped in at #200 today, I have been months since this happen. Maybe I am out? I am not celebrating yet, as I could vanish tomorrow.
Its just been pointed out to me the effect that this sandbox is having on our website. Our website is very rich in content and has significant anchor text links to it (over 3000) yet does not feature in the SERPS despite everything being done by the book. Loads of our pages are PR5s.
Our best position in Google is at 380ish for the search term "Insurance Jobs". When you use the 13 -adsls etc we move to position 7 which is where i would expect it to be. Other dedicated pages of our site that are outside the top 500 also come into play with the filter and rank in the top 20.
I guess what frustrates us more than anything is the fact that at position 380 the site lists after sites about being Gay, single, having life assurance, having cheap motor insurance, and various other sites pagerank 0 that have absolutely nothing to do with the search term.
Which ever way you look at it, our site is firmly in the google bin, i have no idea why or any idea how long it would be before at least one page is released from this nightmare!
The important thing is that their "rightful" (old)position seems to be the current position divided by something like 20...so a page now at position 400 used to be at 20, a page now at position 40 used to be around position 2. This is just a rough example...even my mileage varies widely.
Oh, and some don't appear in the top 1,000, even though they are in the index.
Our best position in Google is at 380ish for the search term "Insurance Jobs"
But look how many results there are, 19 million on .com, 4.75 million on .co.uk, even if you use "" you still have 799,000.
The spam problem is all these businesses are obviously making far too much money and they want to fleece us even more! Why else would they do it?
The problem starts at the top, and bearing in mind many of these companies are owned by the banks, they are the ones who are proliferating all this garbage just so that the head honchos can earn another few million a year, and screw you Mr/Mrs Employee who we'll discard just to make the bottom line look good for our shareholders.
If this is what the sandbox is stopping...wonderful!
GoogleGuy took the floor on WebmasterRadio last night and dropped some hints for a keen listener. (Sorry Guru if you get flooded with replay requests now).
This is simple. These are links from blog spam. Google does not penalize one way spam links.