|Matt Cutts on the Google Sandbox|
Secrets of the sandbox revealed at Pubcon?
The existence of a new-site "sandbox" (which delays the site being ranked well for months) has been a topic of debate among SEOs.
In reply to a question from Brett Tabke, Matt said that there wasn't a sandbox, but the algorithm might affect some sites, under some circumstances, in a way that a webmaster would perceive as being sandboxed.
So, for some sites, in effect there IS a sandbox.
One thing I have noticed is pages with 'Company Name' inbound links to the sandboxed site, ranked higher than the site itself when searching for 'Company Name' if you see that you're definiately sandboxed!
I'm far from an expert on the ways of Google, but there is one other option here besides a "filter," a "penalty," or an "automatic sandbox." It could just be that having a natural growth pattern is one of the many factors that gives you a ranking BOOST in Google, and that lacking that boost can leave a site down on the 23rd page the same way having no title tag would leave your site down on the 23rd page.
Of course, if no one else in your niche has a natural linking pattern or a title tag either, it wouldn't hurt you not to have one. If there aren't many other sites vying for your keyword at all, it may be irrelevant. If the site really nails some of the other factors that boost you up in Google's algorithm, then it may still be result #1 even without this one.
It seems to me that viewing this as an explicit anti-spam measure in which Google thinks "IF the site is new AND has unnatural link patterns AND is in a money keyword BUT not linked to by one of our trustrank sites THEN we're going to stick it in the SANDBOX for six months, ha ha!" makes a lot less sense than viewing it as "long-established sites get a boost, and natural link patterns get a boost, and sites with trusted links get a boost, and if you're missing all three of those boosts and are in a competitive race for a money keyword odds are pretty good that hundreds of sites are probably going to rank in front of you." And if your site lasts six months and its link pattern normalizes, it might find itself much more competitive all of a sudden. And if I were Google, I might find that a nice side effect too. I mean, if a site doesn't rank well in Google but it lasts six months anyway and shows evidence of thriving through normal link patterns developing around it, that's probably a site with quality content.
Just a thought from outside the bubble. (-:
Thanks for your input flicker. Lots of us lurkers out here watching the kiddies play in their "sandbox", thinking only of their one itty bitty grain of sand.
Millions of websites out there have an established "norm" and new sites eventually find a "fit" in the statistical data.
It's heresy to say there is no sandbox.
Incompetant SEO people need something to blame their failures on. They need to blame their shortcomings on Google.
beren ... are you saying you have NO problem setting up new commercial web sites and avoiding the sandbox effect?
|beren ... are you saying you have NO problem setting up new commercial web sites and avoiding the sandbox effect? |
No.. He is just saying he takes full responsibility for his own incompetence. :)
The two statements seem to contradict each other?
>Incompetant SEO people need something to blame their failures on.
If you are going to be a smart alec, try using a dictionary :)
If it's hearsay that's implying is does NOT really exist so how are incompetent SEO people going to use it as an excuse if it doesn't exist .. that's how it came across to me?
heresy is not hearsay
heresy .... sorry my mistake must read more carefully, slapped wrists, naughty boy .. what the hell does it mean? .... something to do with being made up, not an accepted opinion, fallacy, is it much different to hearsay? ..... why don't people just speak English around here for a change.
It would be nice to one day have a thread that focused on the sandbox – what, when, how, why… Instead of the ramblings of “there is no sandbox” “the world is flat” type people.
This post starts with an admission from Matt Cutts that in effect it exists, for goodness sake!
The Google Sandbox = when MSN sends you the majority of your traffic. When you are out of the sandbox, Google sends you the majority of your traffic.
Heresy is an opinion or doctrine at variance with those generally accepted as authoritative.
The amusing thing about this thread is that there is continuous discussion of definitions. It seems that the term "sandbox" in a Google context is generally accepted to mean something that only applies to new sites (althought they might grow old before they come out). A site can't go back into the "sandbox" once it's out. Of course anyone can describe something else as a "sandbox" if they choose to, or they can say there isn't one, but it might be said by others to be heresy.
That's what I said 'not an accepted opinion' but I said it in english!
|That's not my experience. There are many sites with top tens on one engine and nowhere on the others. |
I would say your experience is not the norm. The Google, Yahoo and MSN algos obviously differ, but not so much that you can be top ten on one engine and "nowhere" on another--unless you have been penalized.
While you've made it clear that it is your personal experience which causes you to disagree, your "proof" is a bit vague as it fails to indicate whether or not the "many" sites that exhibit this phenomena are a also part of your personal experience.
Heresy *is* an English word, energylevel. Haven't you ever heard of a heretic?
To leaven the heated debate with a bit of empirical evidence:
We have several sites we launched over the past year; in all cases, they seem to start with a trickle of traffic from MSN, then slowly start to get traffic from Google; Yahoo has been less consistent but always starts after MSN, and sometimes even lags behind google.
The oldest sites (roughly a year old), are beginning to rank better on Google than on Yahoo, MSN and Google's allinanchor. This is the exact opposite of the typical pattern described earlier in the thread.
I suspect this anomaly might be due to a combination of relatively slow/natural link growth combined with sub-par SEO efforts. Perhaps our weak SEO efforts hurt us more in MSN and Yahoo? Another possibility: we are concentrating on trying to get high quality links; perhaps our quality is helping in Google, but our lack of link quantity is hurting us in Y and MSN?
|A site can't go back into the "sandbox" once its out. |
I added 300 e-commerce pages to an established 30 page brochure site. The Google rankings went from lousy to nowhere. Five months go by without altering anything and suddenly it's Google top 5.
I call that being sandboxed.
If it is not sandbox, then it is the only penalty that I'm aware of (other than the sandbox) that requires no action to alleviate.
Patrick--I realize that your post was not necessarily your own opinion, but an effort to synopsize the opinions of others in this thread.
It is my seemingly unpopular opinion that regardless of its age, if your site's growth pattern deviates markedly from the norm, you will be boxed. This is far more likely to occur at the site's inception as the norm must allow for greater variation for older sites.
cws3di said it best:
|So, at the beginning of your domain's life, it is a very narrow margin that your site must fit in (the tiny space at the butt of the cornucopia). If your site has a long history, it tends to stay within the margins easier because of the big bell shape it can move around in. However, as we have all seen, even older sites can change content, lose links, or fall astray of new algo updates, then fall out of the cornucopia norm. |
There’s a million reasons why a site might not rank, throwing it all into one pot and calling it the sandbox obviously dilutes the discussion to the point of it becoming a useless exercise.
However, I do find sites that meet the following to be very curious phenomena and dismissing it robs yourself of potential insight and knowledge into the algorithm just as much as blaming everything on it.
Your site has been created from a new domain
You built it along these guidelines; [google.com...]
You have some appropriate in-bound links
Your site is indexed
Your site has PageRank
It is regularly crawled, displaying cache dates not more than 10 days old
If you search for the site by entering www.xyz.com, it appears with the proper title, snippet, and url (www, or no www whichever you picked)
You rank in the top 20 for allinanchor
You rank in the top 20 for allintext
You rank in the top 20 for allintitle
You do not rank within the first 1000 places for the keyword the site was designed for
Flicker of course I have heard of heretic and I've seen the film too
I have a sandboxed site right now. The google toolbar reports all pages as unranked. However, the PageRank distribution chart in the sitemap crawl stats page shows 100% of the pages in the "Low" category and 0% of the pages in the "PageRank not yet assigned" category. The site was launched in september, prior to the recent page rank update. It was linked to and crawled on the day of the launch.
I had a non-commercial .org site get sandboxed. None of its topics are remotely competitive. The site "popped" out of the sandbox at the most recent PR update.
|Matt said that there wasn't a sandbox, but the algorithm might affect some sites, under some circumstances, in a way that a webmaster would perceive as being sandboxed. |
Sites, not pages. Hmmm.
From the TrustRank paper:
|In order to reduce computational demands, we decided to work at the level of web sites instead of individual pages. |
|I was at the Q&A and listened to Matt's response. The part that I thought was interesting was that Matt said when they (Google) first started hearing about the "sandbox" as the term is used by webmasters they had to look at their algo to see what was causing it and then look at the sites it was affecting. Once they studied it, they decided they liked what it was doing. |
I take this to mean some interaction of separate factors. Such as an interaction between assigning PageRank and assigning TrustRank.
Here is a (crackpot?) theory along those lines:
A. When google first detects a site, it has no TrustRank. Sites without TrustRank can appear in the SERPS, but may not rank well.
B. The everflux crawl for assigning PageRank detects a spammy linking pattern. If the SpamRank is higher than the TrustRank, the whole site is marked as spam. PR shows as n/a. Why waste time calculating PR for spam sites?
C. The everflux crawl determines that an existing site has changed ownership, moved or for some other reason decides that its TrustRank is invalid, so it revokes its existing TrustRank. See B for what happens then.
D. Periodically, an algorithm is run that (re)calculates the TrustRank. This goes out all at once as an update. (No everflux for TrustRank, yet) Bam! Angels get their wings and "Sandboxed" sites get their PR and pop into the SERPS. Their TrustRank overcomes their SpamRank.
E. Low quality stay "in the sandbox" due to their combined low TrustRank and high SpamRank.
Possible spammy linking patterns: redirects, sitewide links, reciprocal links, and links from publicly editable web pages?
This is all just kremlinology.
selkirk, someone looks like they have their thinking cap on today, that's one of the more interesting ideas I've read on this question, and pretty much explains more than anything anyone else has said so far, that's what I always look for when trying to understand stuff like this.
I very much like the idea that the Sandbox may be linked to TrustRank as I can believe it is possible that the addition of TrustRank to the algo created the Sandbox effect.
However, I think a few of the details may require additional consideration.
|B. The everflux crawl for assigning PageRank detects a spammy linking pattern. If the SpamRank is higher than the TrustRank, the whole site is marked as spam. PR shows as n/a. Why waste time calculating PR for spam sites? |
Sandboxed sites can and often do show PR.
|Low quality stay "in the sandbox" due to their combined low TrustRank and high SpamRank. |
If I follow the above correctly, unless TrustRank naturally increases over time for boxed sites, they would not be able to come out of the box without removing the causative violation.
In my mind, it is the "self-lifting penalty" aspect of the box which distinguishes it from other forms of penalization.
I think you are headed in the right direction.
>> spammy linking pattern
How about a "spammy linking pattern" devalues all your links till the set time penalty passes, or until your links get diluted enough (i.e., you get new links with different anchor or from many other "trusted" sources )?
What a nice surprise to wake up and read your post, selkirk.
TrustRank, LocalRank and other relatively recently-published theories (2003 onwards) all require a reordering of the same data or of data acquired in a different manner. (Search for many threads on this in the forums - especially in the Supporters' Forum, if you are a member.)
There is no reason to believe that this necessarily takes place along the same timescale as normal algorithm fluctuations. If it doesn't, it could well produce effects such as sites "springing out" or "gradually appearing" that sandbox proponents have reported.
We have seen this happen with Google in another area (updates of visible PageRank, Directory changes).
There is previous evidence for matching searches against a corpus of respected sites (thanks, NFFC).
The point of all this is, that they are changes to the algorithm or even, if Google is far enough advanced, changes at the level of search query.
(I find that there is some evidence that there were changes to some queries pre-"sandbox" theories and that, post-"sandbox", there are still differences in site/linguistic areas - which could well be confused for "competitive" and "non-competitive" terms.)
The problem with looking at this as a sandbox is that it focuses on "me" and what has happened to "my site/s" and "a penalty applied to my site/s", when what has happened to your site could well be a by-product of algorithmic changes and internal data processing.
And, spookily enough, for me at least, that sounds similar to what the only Google engineer to comment on the subject has said.
This is an interesting theory Selkirk and it may be that Trustrank and the sandbox are linked. It sounds quite logical but there are a couple of flies in the ointment.
Sparkys_DAD says that sandboxed sites can and often do show PR. The sites that I have launched during the last 18 months have always earned PR long before they were released from the sandbox. I have a site going into its second year in the box and it has had a PR4 for many months. I honeslty don't think that the sandbox and PR are related.
|D. Periodically, an algorithm is run that (re)calculates the TrustRank. This goes out all at once as an update. (No everflux for TrustRank, yet) Bam! Angels get their wings and "Sandboxed" sites get their PR and pop into the SERPS. Their TrustRank overcomes their SpamRank. |
If this were true would it not be the case that sites that had nothing changed on them would remain in the box? We know from experience that many sites come out of the box after a quarantine period when nothing changes apart from their age.
EDIT: I just realised that I virtually repeated what Sparkys_DAD said. I think it was the "causative violation" that threw me :)
if a site gets sandboxed, does everything on the site get updated but not the home page? Because this is what has happened to my site and has done since end of august, i have tried all sorts of things and nothing, can anyone help me here!
|does everything on the site get updated but not the home page? |
Updated as in PR, or being crawled? What you are describing sounds to me more like Google's canonical URL problem - on some sites, not being able to identify the correct homepage, eg: www.site.com * site.com * www.site.com/index.html * site.com/index.html.
Some people are advocating a 301 redirect from non-www to www. Also make sure all links to internal pages are consistent.
Please tell me more, my pr is 4 on domain.co.uk, index.asp is 0. How do i resolve this issue in simple steps 1 to 10 as i am a plank at times.