|Sandboxed Sites - Back Together?|
Do they come out together or one by one?
Most of the new sites that I work with are still in the sandbox. Was just curios to know, if all the sanboxed sites come out of the sandbox during one fine major updation or one by one, over the rolling updates?
That is to say, should one be checking to see if the sites are out of the sandbox regularly or only when they know there is a major Google update? :)
BeeDeeDubbleU, I understand what you are saying and perhaps I should have said it as, "Google does not need to go into the sandboxed pages to fetch results". Either way, I don't think it makes much difference. That is really just speculation on my part spawned from ideas gathered here at WW.
While I tend to think this is a capacity issue, there are arguments to me made that the sandbox relates to spam fighting or algorithmic changes.
|How come new pages on established sites are ranking normally? |
I don't have direct experience on this as I have not put up new pages on existing sites. (I have just done so and eagerly await to see the results.) However, from what I have read here, I tend to get the impression that some new pages on existing sites are still sandboxed. Am I right in thinking that? Has anyone here had new pages on existing sites exhibiting sandbox behaviour.
From this thread it seems as though the actual concept of "sandbox" is unclear. So many are saying they're not sure what sandbox actually is or what it applies but yet they are adamant that it is the reason they can't get higher rankings.
I think describing heaven would be easier ... a mystical place that is whatever you want it to be ... ;)
|Sometimes the hardest thing you can do as an SEO is nothing. |
... and you thought I was trying to mislead you ...
Our experience is that new pages on existing sites do not exhibit the so-called sandbox behavior. Here are two things we have observed:
New pages on old sites get much better SERPs than new pages on new sites for highly competitive keywords. This is of course even more true for less competitive keywords.
New pages on new sites can get ranking relatively quickly for uncompetitive terms - 3,4,5 words in length.
The point here is that the variable appears to be the age of the site on which the page is located. But what other factors are there? What could we be overlooking?
I am not trying to mislead anyone. There is a lot of wisdom in that statment and apparently it is wasted on some.
I have had a site come out of the sandbox. I did nothing to get it out of the sandbox. For me, it is that simple.
New pages on old sites get ranked quickly. This is a fact AFAIAC.
I still see no reason for Google to apply algo change only to new SITES as opposed to new PAGES. It just does not compute with me.
|How come new pages on established sites are ranking normally? I have still to hear a sensible reply to this question. |
My favorite theory is that most of the PR (80%-90%?) is not assigned to pages attracting inbound links for a certain unkown period of time (1-3 months?). PR is, however, deducted from a page as soon as a new outbound link is created. This would offset the effects of recipricol linking, and other link trickery.
Consequently, large, mature sites with good PR could afford to add new pages without noticeable harm. If they add too many pages too fast, though (like I did), they will have the PR sucked out of them.
*If* my theory is correct, there must be something in the algorithm that allows new pages to be added to non-sandboxed sites without a significant PR penalty.
Then again, maybe it's just broke.
|I think sandbox applies to links, not pages. Internal links may not be subject to sandbox, but external ones are. |
That is a really neat thought MHes, thanks. Some questions about that: Do you think all new external links are sandboxed or just some new external links? What is the purpose or reason for sandboxing external links? Do you think this could relate to topic sensitive page rank calculations (just a thought)?
|A recent thread questioned the reality of hilltop. I think this thread not only proves its existence, but also how effective it has been. |
Can you explain what led you to this, I can't see the link/jump you are making here?
I don't see the relationship to Hilltop either - I had actually come to the conclusion that it is less of a factor than is often assumed. Though this assumption on my part is mostly due to the fact that we see so many sites ranking so well that have hundreds if not thousands of totally unrelated links.
For me, the most interesting and least talked about possible factor is link age and relevance.
People say the Google SERPs are stale, and I agree, but as far as I can tell they do shift around more often than they used to (albeit much less dramatically).
If Google's of the opinion that their current index is a good benchmark, then perhaps they see no reason for it to be easily upset.
If they're interested in accurate and useful SERPs, then high Page Rank and links from link pages aren't much use.
So many people have been hoarding Page Rank (directed at their home page) and relegating their outlinks to link pages. That isn't particularly democratic, or useful. Its also far removed from the original idea of hypertext.
We only have a few sites. The ones that rank well have old links that use spot-on anchor text from Yahoo and DMOZ. 6 month old spot-on anchor text from Yahoo doesn't work as well (given up on DMOZ).
gomer--I would say you are dead on-target in post 86.
I have 3 main sites. They all rank high for the same keyterms in yahoo, msn, and the new msn beta. Two of them are +200 or worse in goole, while the 3rd is tops. The two that are sandboxed were launched in late Febuary and May. They should both rank higher than the one that is tops in google, because I've pounded them with an incredible number of links from unique domains.
One thing that hasn't been mentioned here, and I don't want to cloud the issues already raised, but it is puzzling nevertheless, is that I can get some pages to rank high on my sandboxed sites when they are first created. They rank for 4 to 6 weeks and then drop into oblivion with the rest of the site.
BDW--if the sandbox is related to capacity issues, and I am camped out with gomer on this one, and while it may be that google indexes pages, capacity issues may have caused google to seperate domains into separate indexes because of the methods uses in matrix calculations. The correct way to do calculations across an index is to create a matrix encompassing all pages (and of course all sites), but if they have to divide it, then it would likely make sense to divide it along domain boundaries. Why? Because the links between pages on a single domain are considered more relevant than links crossing from one domain to another. Thus, the way to divide out domains from one index to the other is by choosing those domains that get the greatest amount of hits, and they go into the main index. Since a site thrown into the sandbox does not get the hits, it stays in the sandbox.
One issues briefly raised here is that google is doing this for money. If that were true, then they surely would have switched their results by now and taken the sites that have continued to rank well, and make money, and thrown them into the sandbox because they've been earning unopposed money for months now, and surely many a sandboxed site is now depleted of resources from paying for adwords trying to get traffic.
Some other stuff not mentioned here is that the number of pages indexed by google over the past few years continued to grow and that number was proudly displayed on their homepage. This number suddenly froze last March at 99.6% the capacity of a 32-bit capacity index and remained there until only two weeks ago when it just less than doubled. Again, it seems as if they threw one more bit on it to identify sites in the other index but again hit a capacity problem. Why didn't it just barely exceed twice that capacity instead of going just under it again?
My bet is that there are 4 indexes now. There is the main index from which all the high traffic terms are called. There is a supplemental index the same size as the main index, and when it got full google simply added this to the main, but sites in it still do not rank when sufficient returns are pulled from the main index. Then there is a 3rd index, which has the same capacity as the main and the supplmental, which is taking on the overflow from the first two. Then, there is a new index being built from the others based on 64-bit hardware and 64-bit linux running a 5-byte (40-bit) inverted index.
Surely if google wanted to inflate their stock prices, they would release this new index prior to the upcoming stock sale that was announced yesterday. By doing this, those sites that have remained high in the SERPs as a result of newer sites not ranking, and as a result obtaining uncompeted income, will then need to take out adwords to maintain their accustomed level of traffic. And, they'll be better able to afford than the sites that have been sandboxed but which will suddenly rank well once the whole kit-and-caboodle gets put together and a 'normal' matrix can be established across which all sites can be fairly algorithmed.
And as for the claims of some, well, we all want attention and I can just imagine the stickies being sent to the wizard who can turn snails into horses. As for those that claim a site that ranks at +200 or +300 is proof that it's out of the sandbox, get real, that's not rank. My sandboxed sites rank #1 for a lot of esoteric terms that no one's had the gall to compete on before.
"Because the links between pages on a single domain are considered more relevant than links crossing from one domain to another."
I don't think this is so; the HITS algorithm removed the effect of internal links in some experiments and arrived at improved results.
Good post Neuron. Your theory is as good as anyone elses.
I still cannot understand how anyone would suggest that this is deliberate. Sure it stops new spam from appearing. Obviously it does because it stops new EVERYTHING from appearing.
I also cannot understand where the media are on this one? How can something as radical as this be going on at Google without media coverage? We may never find out the truth unless this starts generating adverse publicity.
Some original thoughts neuron.
I remember the thread about an Auzzy journalist being saved by Google in Iraq, coz the terrorists traced his roots to Australia. And someone said, "Thank God, that site was not Sandboxed!" Very apt and relevant.
Some of my observations on this so called sandbox from the experience I had first-hand -
- Unique Terms without any competition: rank well irrespictive of whether the site is new or old.
- Phrases that are combination of 2 or more generic words, where no one searches for those generic words together, (often the company names which are a combo of generic words, but having results in the visinity of 1m+): Old sites rank pronto, new sites take more than expected time to rank, but these are the terms the new sites rank first for. Largely the SERPs are filled with larger sites having mentioned these generic terms scattered.
- Moderatley competed for Terms: Terms that require a couple of backlinks with anchor text to rank, such as Red Widgets City, where the daily search is perhaps 5 to 20: Older sites with decent link popularity rank pronto, but newer sites with even doulble as many links as the older sites, will rank 100+.
- Competitive Terms: The biggies. Even the older sites will require further relevant link popularity, and exhibit the similar behaviour as new sites. Won't rank for couple of months, but differ in that they rank atleast within 6 months.
This gives away reasons to believe it is the links that are sandboxed, not the site as a whole. Since, older sites have already old non-sandboxed links, they are relatively faster than the new ones, which anyway have all links as new.
The topic of this thread (it was started by me btw), puts to rest any further doubts. If sandbox (of links) were not to be there, then sites would rank from oblivion to atleast <100 positions, if not top 10, within few months, one by one, since Google does a rolling update. But it is not the case. People report their sites are out of the sandbox during one major update. They rank from oblivion to some visible presence for competitive terms and from 100+ position to top 10 position all at once. They won't traverse the middle path at all. If they did, its natuarl of a rolling update.
P.S: I am not talking about sites that have bought 1000s of links in one shot, but links achieved regularly.
Its baffling how much time people spend on the "Sandbox". Since older domains rank, buy an older domain.
Imagine it's the year 2003 and you're a search engine. Imagine you want to go public next year. Imagine your algo is mainly based on link popularity but your serps are swamped by link factory spam. What do you do? You delay the effect of links.
Interesting the way you put that Hanu. So simple, so effective.
I believe that there are two things going on at the Googleplex. One is a capacity problem, and the other is their continuing efforts to fight spam. I've theorized at length about the capacity problem, so now I'll weigh in on the spam fight. I'll describe how I would fight spam if I were Google. This might be an explanation for the sandbox phenomenon.
Google assigns PageRank according to backlinks, and each backlink is weighted according to its own PageRank. So far, so good. But what if Google, with all the data it has collected over the years, decided to plot a "natural growth in PageRank" curve for a typical, non-spammy web page? And what if they determined that spammy sites tended to exceed this natural growth pattern?
Then they could assign real PageRank (as opposed to the toolbar PageRank) only up to the point where the backlinks do not exceed this growth rate. New pages would tend to exceed this threshold if they are link-optimized; i.e., they're growing backlinks at an "unnatural" rate. Old pages, which first appeared in Google's index many months ago, might not have this problem because the threshold is determined by calculating the average over time for that page.
It would not be that much trouble for Google to save real PageRank calculations for each page they index, and build up a history for that page. GoogleGuy, on another forum, has admitted that they have internal access to all the backlinks for every page at the Googleplex, even though they choose not to show them all with the "link:" command.
It's a no-brainer to do something like this if you're serious about spam. It would be a lot more elegant than what they did one year ago with the Florida filter. The problem with Florida was that they tried to do an instant fix by suddenly plopping in a real-time filter. It didn't work well because by then the entire PageRank infrastructure behind it was already corrupted by spammers.
What I don't understand is why blogs continue to break all the rules and rank so well. Maybe they are handled separately in some sort of "freshness" equation. This might be more acceptable if those blog pages would fade in rank a lot more quickly once they appear so prominently. But the blog advantage for a typical blog page seems to go on for months or more at Google.
|Imagine it's the year 2003 and you're a search engine. Imagine you want to go public next year. Imagine your algo is mainly based on link popularity but your serps are swamped by link factory spam. What do you do? You delay the effect of links. |
... but only for new sites?
Why allow the existing spammers to merrily continue generating spammy new pages?
|Why allow the existing spammers to merrily continue generating spammy new pages? |
Existing spammers were filtered by the C-class IP address penalty or were sorted out manually, I suppose. The typical spammer built a network of sites on generic domain names, cross-linked them like crazy, enjoyed the traffic as long as possible, got caught, was penalized, moved on and started from scratch somewhere else.
More info for sandbox theorists...
I just released a new site (first in about 6 months). Pretty much expected it to do nothing for a few months based on the "sandbox" theory. Now within 2 weeks 55,000+ pages are fully indexed and ranking well in Google. The search terms are "deep" so that may explain ranking well. The domain is a new domain registered about 4 months ago. The site is pulling significant traffic (for new site).
I pretty much assumed with the latest index update anouncement they were letting just about anything in to get the numbers up.
|Small Website Guy|
|Our experience is that new pages on existing sites do not exhibit the so-called sandbox behavior. |
This is absolutely correct. I've added new pages to a site that a day later pulled in hundreds of hits a day. (Easy to do with good page rank plus a current events topic.)
|Existing spammers were filtered by the C-class IP address penalty or were sorted out manually, I suppose. The typical spammer built a network of sites on generic domain names, cross-linked them like crazy, enjoyed the traffic as long as possible, got caught, was penalized, moved on and started from scratch somewhere else. |
I was not just talking about spammers. Any legit site, (like my main site), can add new pages and get them ranked within a few days. You can see that from the posts above and I am working on another new page right now. Sorry but I just cannot subscribe to any suggestion that this lag is deliberate. It makes no sense at all to ban all new sites from the SERPs. Had this been a spam prevention measure some comment from Google would have leaked out by now. Believe me, we'll only start to find out what's going when the press get a hold of it.
"New pages on old sites get ranked quickly. This is a fact AFAIAC."
This is not a fact. Try to create a new page to rank for a search term where you already have a page ranking decently. It is difficult to get a new page to outrank a more mature page, even if the new page obviously should, like a page about Portland being outranked by an Oregon page for a "Portland" search, where the new page has more accurate/better anchor text, etc.
I was able to add a couple of pages every few days and have them ranking withing a few more days on an established site, until the site disappeared.
Yes, but if I create a new page on an old site and a new site, the old site has the advantage every time.
I almost see the battle between conservatism and progressivism on Google! But one someone will come along and say out with the old and in with the new! Google is now longer the New New, and the sandbox just further enhances this notion.
If a page has never had a PageRank before, it can be defined as a new page.
If it is not a root page, then flag it and defer the ranking until the root page for that domain has been ranked. After the root page is ranked, give the new page a PageRank of root page minus one or two.
If the root page itself has never had a PageRank before, start it out with a "new root page" PageRank that seems reasonable, but is independent of its backlinks. The next time around it won't be new, and can start growing its "natural" PageRank if it has sufficient backlinks.
This isn't so exotic. In the old days Google used to assign a PageRank of root minus one (according to the toolbar) for every directory deep where a new page was found on an old domain. That would work between updates. Then at the next monthly update it would acquire a more accurate PageRank.
My theory is G is serving pages bases on 6 months moving average of page PR similar to Alexa ranking..
My fascination with the sandbox is how perfectly subtle it is. We constructed our sandboxed sites in the same manner as other successful ones; same linking methods, submissions to the same places and directories. They get indexed, are granted PR, receive regular spidering and have good fresh cache dates.
If you search for the sites using various commands like allintitle, ect. there they are looking perfect and right near the top; Title just right, snippet right on the money, fresh tag from two days a go. Everything is exactly as it should be except they just can’t get anywhere for the key words you are optimizing for. (I don’t know the definition of “competitive” but our sites are chasing terms that produce results from 3 to 8 million)
Now I know there are pundits on this board who say there is no way around this thing because it doesn’t exist in the first place. And there are other wise men who say they can get out of it with a little extra hard work and smarts. All that may be true, and my hats off to you, but it doesn’t change the fact that an awful lot of people have launched sites eight months ago, in the same fashion they always have, that don’t even show on the radar screen for terms they were designed for.
It is absolutely the most bizarre thing we have encountered in this business. The real problem with it is what do you even say about it? “gee whiz, we don’t rank for this competitive key word as well as we should, darn engine must be broken, can’t be our fault”. Or how about, “sorry sir, but your site is due to break the first 1,000 places hopefully in about 10 months, uh, we think”. Other than the kind people on this board allowing sandbox sufferes like ourselves to rant a little, theres no one to share this little problem at work with; “Hey hows everything at work?” “Uh great, except for this thing they call the sandbox, its, uh,, well its.. oh never mind, everythings great, and you?”
I am a fan of Google; always have been and probably always will be. We have done well in this business and Google has been a big part of that so no bashing here. The thing that’s really beginning to bother me is a growing fear that when we find out what is causing this thing, its going to be something so incredibly obvious, we just won’t ever get over the fact we couldn’t figure it out.
Let Google estimate how badly the SEOs pissed off and its impact on Adwords revenues.
talking of commercial websites...
The question that Google would have asked to itself is "why would someone (online/offline) build more the one website for the SAME Business?"
Answer- Just to play with SEO and acquire top of positions with variety of websites in portfolio attacking the variety of KWS.
Counter Action - The Genuine SINGLE Biz - Single Website owners would also not be encouraged for few months (along with build-new-website-crazy-webmasters)to see if both of the groups start getting used to Adwords.
Result - Still to come
The idea that the sandbox helps boost Adwords revenue has been mentioned before, and I believe this is the most probable reason.
If it were something like the search engine is broken, then why would they double the size of their index?
Since no Google rep will make a statement about it tells us a lot:
1) It must exist, because if it didn't, they could easily tell us.
2) The reason for its existence is not something they want to tell the public, because we would likely not like the answer.
Meanwhile, other engines are working on beating Google, and working on their image among webmasters. 2005 is going to be very interesting.
The press needs to get a whiff of the sandbox (and it stinks) for Google to do anything about it. Someone with some press contacts needs to come up with a few good examples of searches for company names that are new that Google comes back with no results while Yahoo/MSN comes up with the official site. The press would have a field day with that.