Forum Moderators: Robert Charlton & goodroi
In reply to a question from Brett Tabke, Matt said that there wasn't a sandbox, but the algorithm might affect some sites, under some circumstances, in a way that a webmaster would perceive as being sandboxed.
So, for some sites, in effect there IS a sandbox.
Of course, if no one else in your niche has a natural linking pattern or a title tag either, it wouldn't hurt you not to have one. If there aren't many other sites vying for your keyword at all, it may be irrelevant. If the site really nails some of the other factors that boost you up in Google's algorithm, then it may still be result #1 even without this one.
It seems to me that viewing this as an explicit anti-spam measure in which Google thinks "IF the site is new AND has unnatural link patterns AND is in a money keyword BUT not linked to by one of our trustrank sites THEN we're going to stick it in the SANDBOX for six months, ha ha!" makes a lot less sense than viewing it as "long-established sites get a boost, and natural link patterns get a boost, and sites with trusted links get a boost, and if you're missing all three of those boosts and are in a competitive race for a money keyword odds are pretty good that hundreds of sites are probably going to rank in front of you." And if your site lasts six months and its link pattern normalizes, it might find itself much more competitive all of a sudden. And if I were Google, I might find that a nice side effect too. I mean, if a site doesn't rank well in Google but it lasts six months anyway and shows evidence of thriving through normal link patterns developing around it, that's probably a site with quality content.
Just a thought from outside the bubble. (-:
The amusing thing about this thread is that there is continuous discussion of definitions. It seems that the term "sandbox" in a Google context is generally accepted to mean something that only applies to new sites (althought they might grow old before they come out). A site can't go back into the "sandbox" once it's out. Of course anyone can describe something else as a "sandbox" if they choose to, or they can say there isn't one, but it might be said by others to be heresy.
That's not my experience. There are many sites with top tens on one engine and nowhere on the others.
I would say your experience is not the norm. The Google, Yahoo and MSN algos obviously differ, but not so much that you can be top ten on one engine and "nowhere" on another--unless you have been penalized.
While you've made it clear that it is your personal experience which causes you to disagree, your "proof" is a bit vague as it fails to indicate whether or not the "many" sites that exhibit this phenomena are a also part of your personal experience.
We have several sites we launched over the past year; in all cases, they seem to start with a trickle of traffic from MSN, then slowly start to get traffic from Google; Yahoo has been less consistent but always starts after MSN, and sometimes even lags behind google.
The oldest sites (roughly a year old), are beginning to rank better on Google than on Yahoo, MSN and Google's allinanchor. This is the exact opposite of the typical pattern described earlier in the thread.
I suspect this anomaly might be due to a combination of relatively slow/natural link growth combined with sub-par SEO efforts. Perhaps our weak SEO efforts hurt us more in MSN and Yahoo? Another possibility: we are concentrating on trying to get high quality links; perhaps our quality is helping in Google, but our lack of link quantity is hurting us in Y and MSN?
A site can't go back into the "sandbox" once its out.
I added 300 e-commerce pages to an established 30 page brochure site. The Google rankings went from lousy to nowhere. Five months go by without altering anything and suddenly it's Google top 5.
I call that being sandboxed.
If it is not sandbox, then it is the only penalty that I'm aware of (other than the sandbox) that requires no action to alleviate.
Patrick--I realize that your post was not necessarily your own opinion, but an effort to synopsize the opinions of others in this thread.
It is my seemingly unpopular opinion that regardless of its age, if your site's growth pattern deviates markedly from the norm, you will be boxed. This is far more likely to occur at the site's inception as the norm must allow for greater variation for older sites.
cws3di said it best:
So, at the beginning of your domain's life, it is a very narrow margin that your site must fit in (the tiny space at the butt of the cornucopia). If your site has a long history, it tends to stay within the margins easier because of the big bell shape it can move around in. However, as we have all seen, even older sites can change content, lose links, or fall astray of new algo updates, then fall out of the cornucopia norm.
However, I do find sites that meet the following to be very curious phenomena and dismissing it robs yourself of potential insight and knowledge into the algorithm just as much as blaming everything on it.
Your site has been created from a new domain
You built it along these guidelines; [google.com...]
You have some appropriate in-bound links
Your site is indexed
Your site has PageRank
It is regularly crawled, displaying cache dates not more than 10 days old
If you search for the site by entering www.xyz.com, it appears with the proper title, snippet, and url (www, or no www whichever you picked)
You rank in the top 20 for allinanchor
You rank in the top 20 for allintext
You rank in the top 20 for allintitle
You do not rank within the first 1000 places for the keyword the site was designed for
I had a non-commercial .org site get sandboxed. None of its topics are remotely competitive. The site "popped" out of the sandbox at the most recent PR update.
Matt said that there wasn't a sandbox, but the algorithm might affect some sites, under some circumstances, in a way that a webmaster would perceive as being sandboxed.
Sites, not pages. Hmmm.
From the TrustRank paper:
In order to reduce computational demands, we decided to work at the level of web sites instead of individual pages.
Sites. Hmmm.
I was at the Q&A and listened to Matt's response. The part that I thought was interesting was that Matt said when they (Google) first started hearing about the "sandbox" as the term is used by webmasters they had to look at their algo to see what was causing it and then look at the sites it was affecting. Once they studied it, they decided they liked what it was doing.
Here is a (crackpot?) theory along those lines:
A. When google first detects a site, it has no TrustRank. Sites without TrustRank can appear in the SERPS, but may not rank well.
B. The everflux crawl for assigning PageRank detects a spammy linking pattern. If the SpamRank is higher than the TrustRank, the whole site is marked as spam. PR shows as n/a. Why waste time calculating PR for spam sites?
C. The everflux crawl determines that an existing site has changed ownership, moved or for some other reason decides that its TrustRank is invalid, so it revokes its existing TrustRank. See B for what happens then.
D. Periodically, an algorithm is run that (re)calculates the TrustRank. This goes out all at once as an update. (No everflux for TrustRank, yet) Bam! Angels get their wings and "Sandboxed" sites get their PR and pop into the SERPS. Their TrustRank overcomes their SpamRank.
E. Low quality stay "in the sandbox" due to their combined low TrustRank and high SpamRank.
Possible spammy linking patterns: redirects, sitewide links, reciprocal links, and links from publicly editable web pages?
This is all just kremlinology.
I very much like the idea that the Sandbox may be linked to TrustRank as I can believe it is possible that the addition of TrustRank to the algo created the Sandbox effect.
However, I think a few of the details may require additional consideration.
B. The everflux crawl for assigning PageRank detects a spammy linking pattern. If the SpamRank is higher than the TrustRank, the whole site is marked as spam. PR shows as n/a. Why waste time calculating PR for spam sites?
Sandboxed sites can and often do show PR.
Low quality stay "in the sandbox" due to their combined low TrustRank and high SpamRank.
If I follow the above correctly, unless TrustRank naturally increases over time for boxed sites, they would not be able to come out of the box without removing the causative violation.
In my mind, it is the "self-lifting penalty" aspect of the box which distinguishes it from other forms of penalization.
I think you are headed in the right direction.
TrustRank, LocalRank and other relatively recently-published theories (2003 onwards) all require a reordering of the same data or of data acquired in a different manner. (Search for many threads on this in the forums - especially in the Supporters' Forum, if you are a member.)
There is no reason to believe that this necessarily takes place along the same timescale as normal algorithm fluctuations. If it doesn't, it could well produce effects such as sites "springing out" or "gradually appearing" that sandbox proponents have reported.
We have seen this happen with Google in another area (updates of visible PageRank, Directory changes).
There is previous evidence for matching searches against a corpus of respected sites (thanks, NFFC).
[webmasterworld.com...] (2001)
The point of all this is, that they are changes to the algorithm or even, if Google is far enough advanced, changes at the level of search query.
(I find that there is some evidence that there were changes to some queries pre-"sandbox" theories and that, post-"sandbox", there are still differences in site/linguistic areas - which could well be confused for "competitive" and "non-competitive" terms.)
The problem with looking at this as a sandbox is that it focuses on "me" and what has happened to "my site/s" and "a penalty applied to my site/s", when what has happened to your site could well be a by-product of algorithmic changes and internal data processing.
And, spookily enough, for me at least, that sounds similar to what the only Google engineer to comment on the subject has said.
Sparkys_DAD says that sandboxed sites can and often do show PR. The sites that I have launched during the last 18 months have always earned PR long before they were released from the sandbox. I have a site going into its second year in the box and it has had a PR4 for many months. I honeslty don't think that the sandbox and PR are related.
D. Periodically, an algorithm is run that (re)calculates the TrustRank. This goes out all at once as an update. (No everflux for TrustRank, yet) Bam! Angels get their wings and "Sandboxed" sites get their PR and pop into the SERPS. Their TrustRank overcomes their SpamRank.
If this were true would it not be the case that sites that had nothing changed on them would remain in the box? We know from experience that many sites come out of the box after a quarantine period when nothing changes apart from their age.
EDIT: I just realised that I virtually repeated what Sparkys_DAD said. I think it was the "causative violation" that threw me :)
does everything on the site get updated but not the home page?
Updated as in PR, or being crawled? What you are describing sounds to me more like Google's canonical URL problem - on some sites, not being able to identify the correct homepage, eg: www.site.com * site.com * www.site.com/index.html * site.com/index.html.
Some people are advocating a 301 redirect from non-www to www. Also make sure all links to internal pages are consistent.