|Sandboxed Sites - Back Together?|
Do they come out together or one by one?
Most of the new sites that I work with are still in the sandbox. Was just curios to know, if all the sanboxed sites come out of the sandbox during one fine major updation or one by one, over the rolling updates?
That is to say, should one be checking to see if the sites are out of the sandbox regularly or only when they know there is a major Google update? :)
|Why dont you make it easier by throwing example of so called only 5 such Normal sites performing well on compititive KWS? |
I cannot give specific examples, and also I am not saying "I have lots of sites that have beaten the sandbox. I am expressing our hopefully well thought out and well researched reasons for why sites from February onwards are not in general performing in google
|Pimpernel, how do you explain what I said earlier, that even new pages that don't fall within the keyword categories of an existing (well listed) sites are sandboxed. |
I am not sure that I understand the question, but what I do now is that it is simple to get existing sites to perform in google under new keywords, even new categories of keywords, and it is a whole different ball game with sites created since February. This entirely fits in with our theory and simply reflects the fact that PageRank flows down through a site, so new pages will benefit immediately from the existing PageRank of the site.
mark1615 - See my comments above. Sure each web page is judged on its merits, but the large majority of rating of a web page comes from internal links (i.e. it is linked to from the home page of the site). So, in reality in most cases, you are actually looking at web sites rather than web pages when assessing rating. It is for this reason that new pages on an existing site have no problem with ranking. The problem for new sites is that they can't get a good rating and therefore cannot pass that rating on to the individual web pages which are the ones that perform.
|If I move a page from our domain with the new (possibly penalized) name to an older subdomain, it shoots up to #3 position and stays there. The PR of the linking pages are the same (PR6 index page => PR5 subpage => page in question). All outgoing links on the page were kept the same. |
I think the answer to that is don't believe PageRank is everything. The algorithms that are suppressing new web sites since February are anti-spam algorithms, not ranking algorithms per se. The above is entirely consistent with our theory that a new site must do far far better in our traditional measurement terms to beat an old site.
|To me it just seems like links are taking longer to have effect. Really old links from DMOZ and Yahoo are gold. |
Right on the money! The simple fact is that with a lotta lotta hard work you can beat the "sandbox" effect, although it is highly questionable whether it is worth it. And that is exactly what google wants to happen - we all give up because it is no longer worth it, and google can revert to making its own decision about what the most relevant sites are, without any interference from us nuisances.
As regards the sandbox effect, someone posted a messaghe saying call it what you like, the effect is the same. Well, I think we are talking about a fundamentally different thing here. There is no sandbox because there are lots of sites launched since February that are doing perfectly well. "Sandbox" suggests that every sites is affectd, which simply is not the case. That is why I don't believe in the Sandbox.
|And that is exactly what google wants to happen - we all give up because it is no longer worth it, and google can revert to making its own decision about what the most relevant sites are, without any interference from us nuisances. |
So you subscribe to the belief that Google, supposedly the World's best search engine, is happy not to feature any (or very few) new sites for upwards of nine months?
It does feature them, they just do not feature very often and under competitive search terms. Look, as I've said before, this "Google is cr*p because it is not ranking new sites" is a load of nonsense IMO. Distingusih between your own frustration at the lack of ability to get new sites performing in google and the quality of google's results.
Here is the perfect example that ec=veryone has been harking on about. Bridget Jones, The Edge of Reason is just released. Go to google and search for it. Find the official site and check when it was registered:
24 May 2004
This is exactly the type of example that everyone has been quoting saying google will not reflect newly released movies' websites. Well it does and the question everyone should be asking is - how the hell did it manage that when I can't get my sites to rank!
And the answer is I have said before - sites from February onwards have to comply with much much tougher google algorithms compared to sites pre-February 2004.
Now look at the quality of sites linking to the official site and you will see why they have beaten the filters and you have not.
But look also at the unofficial site that is occupying top slot - domain registered back in 2001 (had she written the book then!). It is there either because the site was created and indexed before Feb or else the quality of links that they have, which are not bad.
But one thing for sure - Google is not manually letting sites out of the sandbox and keeping everyone else in.
I guess what I am saying here is that I believe that a lot of good brains are wasting their time discussing this sandbox and how unfair it is and how google is going to end on the scrapheap, instead of concenntrating on what it takes to get out of the sandbox.
Let's have a few positive postings, eh!
I think it's about time that some people understand that the sb is not all black and white. The common line of reasoning is "I can't believe Google would deliberately block new sites." First and foremost, it's important to comprehend that the sb only affects competitive searches that return lots of mostly redundant sites. In these areas the user doesn't notice that a few sites are missing. Therefore the user is happy with the results and hence Google is happy.
In non-competitive areas, the sb is not apparent - at least in my experience. My sandboxed sites receive more traffic from three or for word queries than from the more competitive two word queries. But they do receive good traffic, so they're not at all hidden. Consequently, the user which does more specific searches is happy, too.
The sb only induces problems for sites that have to offer fresh and unique content (like breaking news) in competitive areas. But the users will quickly learn, that they can use Google's news search or that they need to be more specific. The more specific the queries get, the easier Google can read the user's intent and deliver more accurate serps.
|It does feature them, they just do not feature very often and under competitive search terms. |
In my book that means NOT featuring them ;)
We are talking percentages here. How many new sites have escaped the (very real) sandbox? 1%? 2%?
How many new sites are SB'd 98%? 99%? No matter how you look at it the sandbox is very real to most of us. It may be an algo function bit and you can call it what you like but it is very real and it is stopping new sites from being featured in the serps.
Regarding your point about Bridget Jones, if I had a site containing the phrase "Bridget Jones the edge of reason", I would expect it to be found in spite of the sandbox. None of these words are competitive terms.
Ok, change the search to just Bridget Jones - over 2 million results with that exact phrase, and the official site is on the second page. I call that competitive.
So what do you consider competitive? Presumably mortgage or gambling or travel terms etc? In these areas sites post Feb have got a mountain to climb to overcome the entrenched (and yes favoured) positions of sites pre-Feb.
Those who managed to avoid the sandbox, one little query. Have you managed to get links from pages that already rank within 30 or 50 for the search term you are targeting?
Working alongside a new theory :)
I am not into gambling and travel as you suggest. Occasionally I build small websites for small businesses and none of them have got out of the SB since February.
One of these sites is for a small specialist consultancy. It contains lots of useful information about the service that is provided and I really mean that. I even found it very interesting myself while I was doing the site.
The site has a few inbound links and PR4. When I do a Google search for a nine word string of text (that's 9!) that is a page title on this site and that contains no real competitive terms it cannot be found in the top 200 results.
The term is something like ...
Teaching widget as a widget widget in location country
When I do the same search in Yahoo it is number one. Nuff said?
This is not spam filter behaviour and no one on earth will convince me that a search engine that performs as badly as this does not have a major problem. It should surely be capable of determining that a coincident, nine word search phrase in a page title must be relevant.
Incidentally, when I do the same search enclosed in quotes it comes up number three, beaten by two sites with zero PR and nothing related to the term in their page titles.
I really do sympathise with your plight, but on the one hand you are saying that the bridget jones example is a bad one because the term is not competitive and then you are showing your clients examples which are all very niche and non-competitive. My point remains - competitive or non-competitive, sites are getting listed and ranked and the focus of this forum should be on how to "break that filter", not to moan about google being broken, a cr*p search engine etc etc.
The fact that msn lists your client's site at the top does not mean it is a good search engine, or "fixed". A case in point - search msn for bridget jones and see if you can find the official movie site. You won't. Now I say google is better than msn at delivering the right results.
|When I do a Google search for a nine word string of text |
What happens when you search the text used in anchors for your inbound links?
Most of the Inbound links are either the URL or of the form Widgeted Widget Consulting, the company name. When I search for this it comes up in the top three.
ok Pimpernel (scarlet?) I agree with you. The real question is how does one break the filter? I think that is the elephant in the room that no one has really addressed. Let's not debate whether the algo changed at the beginning of the year, of if there is sandbox or lag. The only thing anybody here really cares about in practice is how to get a new site or new page to the top of the SERPs. Let's stipulate that it must have good content. Let's further stipulate that people have read Brett's seminal primer on the subject. And let's say that you want to be in the top 20 for a 2 word term that returns at least 2MM results and that you are using a domain less than 3 mos old. Where do you start and what do you think is a reasonable time to achieve the target?
Does that reasonably focuse the practical question?
If I could take the liberty of going a bit off topic for a moment, has anyone been able to use the sandbox to his/her advantage?
Personally, I have lots of pages that rank well in several industries. We sometimes sell advertising on these. If I could figure out how to get a list of sites that have been created since Feb 2004 (WHOIS database?), I could offer our services to them.
My heart goes out to anyone relying heavily on the web to get a newly established law firm or insurance company off the gound these days. I would think that these folks would jump at the chance to get into #1 position.
Good idea, or too labor intensive?
My main concern is not "breaking the filter"...it's whether the sandbox (or some time-lagged SERPS penalty effect) is REAL or not. If it IS, then I'm content with waiting it out. If it ISN'T, then I need to find out why my site (and all of its pages) are delivering such awful rankings, and try to fix it. Can anyone see the distinction?
What you have to do is about 10 times what you had to do before. Get lots of good quality links, preferably on theme, mak sure the site is indexable, optimise the pages reasonably (nothing over the top) and you will get the good rankings eventually. It may take several months but it will happen.
The difference to before was that you could launch a site, link to it from a few existing sites within your network and bang within a week you are top of the SERPs. No more!
And this is what I mnean when I say that google has written very very tough anti-spam algorithms that only affect sites created since Feb.
Now the simple fact is that for many people the amount of effort required to do the above makes the return debatable and many will decide not to go down that path, and google will be delighted!
“Get lots of good quality links, preferably on theme, mak sure the site is indexable, optimise the pages reasonably (nothing over the top) and you will get the good rankings eventually.”
Have you had success with this, for terms returning in excess of 2 million results? And if so, approximately how many back links were needed, and how long did it take? Perhaps your right and we are just falling well short on the back links. We have been a bit conservative adding links, as there was a lot of talk early on regarding the SB, that to many links to fast was the problem.
Any insight would be appreciated.
Yes. But the site has been worked on full time by a member of staff for over 6 months now. Steady increase in links, more and more content, good quality links out. What have we achieved? Movement from not being in the first 1,000 to being in the first 100 under 2 and 3 word terms. Every week the performance gets better but it is a hard slog and very expensive. And who knows - just when we have finally really cracked it and got on the first page under "hotels" (that is the theme - a hotel portal) Google will probably change its algorithm and we'll drop back down to 1,000!
A little more specifics - the main search term we are using to judge success is a 3-word term. We have gone from nowhere (despite being indexed) to number one in google. Whilst there are 2,690,000 results, when you put it in quotation marks there are just 133 results. So it is not very competitive but is a good measure of improvement.
However if you then search for 2 of the 3 words, which is a very competitive phrase, you get 24 million results and in quotation marks 379,000 results and our site is 215th, which is not a bad performance.
So you get the drift - it is a long hard slog and cannot be achieved overnight.
"" I cannot give specific examples, and also I am not saying "I have lots of sites that have beaten the sandbox. I am expressing our hopefully well thought out and well researched reasons for why sites from February onwards are not in general performing in google ""
is this Reserach just based on 1 single website, if yes.. can you sticky me please?
if no, then please support your opinion by some examples...
I dont feel a thorough research and a conclusion can be made without any sampling or illustration.
Thank you for info, we do appreciate it. We will take your good advice and keep working at it. You are right, the bar to success has been raised and we must rise to the challenge.
I think at the end of the day our concern is if you have to work the SEO that hard, just to crack the first 300 places, (we would see that as progress in our situation certainly) something is not right. It seems to be fostering an environment that is the opposite of what Google would want, more and more artificial SEO.
Sites, that have something to offer, that are constructed in a professional manner, adhering to Google’s own suggestions should not sit in these positions, for this amount of time. It’s not doing anyone any good; users, Google, or e-commerce in general.
Some day maybe we will find out we have all been missing something really obvious. Until then we will continue to improve our back links and content and hope for some improvement.
Its a long game and that is kind of enjoyable in a way -- take your time -- there's lots to do -- building links, writing content, building new sites, checking out the competition, writing newsletters... and then a year from now or two years from now you are set.
Looks like it's time to rehash a theory I discussed couple of months back.
What is the sandbox/lag?
This can only be answered in the context of secondary indices and google's index capacity problems. Google has multiple indices (DBs): a main index and one or more secondary indices.
The secondary index entity was first introduced as Google's supplemental index. This occurred sometime in 2/04 or at least this was the first time that my sites were affected. At the time I lost 80% of my G traffic and as many pages were tagged supplemental. G claims the supplemental index was created to "enhance" the SERPS for obscure queries. This, of course, sounds more a marketing ploy to cover the need for G to make room in the main index for new pages.
The web kept growing exponentially and by June G had to something more than can be accomodated by the "obscure" query rationale. G also decided to relegate entire domains (new sites) into the secondary index. These new pages are not tagged as "supplementals" although they behave like supplementals. This is when the term "sandbox" or lag was introduced by webmasters. Interestingly, G vp of engineering hails this as enhancing G index by showing "more obscure" results! Now you can see results for 5-10 word queries!
The next major event was in November when Google announced 8b pages in its index. All that G did was to start taking credit for the total of main and secondary indices. As many have reported (including me), the number of pages reported by the <site: query> tripled but the traffic from G remained practically the same! This says that most of the new pages being reported are not competing in the main serps.
Characteristics of a secondary index
As gathered from pronouncements by google re-supplementals, observations about supplementals and sandbox symptoms:
- pages in the secondary indices appear in the serps only for "obscure" queries. this means that G first searches the main index and if the number of results is less than a threshold, Google then includes results by searching the secondary searches.
- it appears that the pages in the secondary indices do not participate in the PR calculations. this means that pages and links are not represented in the PR matrix. this could be part of the capacity problem. people have reported PR values from the TB for supplemental pages. the values are probably residual values from previous pr calculations.
- it also appears that pr calculation requires that a domain must exist in the main index. so the only way pages of a "sandbox" site can start to compete in the main serps is if the domain get's transferred to the main index. this also explains why some new pages on old sites are able to rank pretty quick.
- the november "update", the inclusion of a lot of new pages from old sites created the need to store these in the secondary indices but not tagged as supplementals. this created the same "sandbox" effect for new pages on old sites. Only difference is that these pages are able to move from the secondary to the main index due to the domain already existing in the main index.
- pages can only move from the secondary indices to the main index if there is room in the main index. thus google continues to apply filters to cleanup the main index. once in a while it removes or bans an entire domain. this creates new space for one of the sandboxed domains.
- it is questionable whether pages are able to get out of the supplemental index. most of my supplemental pages are dated feb-mar 2004 which says G does not even bother updating the supplementals. Many people have reported pages listed as supplementals that do not exist anymore.
If this theory is correct, then inclusion in the main serps is entirely up to the pace at which google can remove pages/domains from the main index. It has nothing to do with your pages.
All we can do is wait until G has solved its capacity issues for new sites/pages to start competing freely in the main serps.
If you are feeling magnanimous, help G by removing dups, useless pages (such as webstats), error pages etc from the G index.
Good info. How important do you think on-topic links are?
We are in few different fields and we frequently see two things from competition in the top 10:
1) No or almost on-topic links, just a very large number of blog links.
2) On-topic links that are part of their network.
|I think at the end of the day our concern is if you have to work the SEO that hard, just to crack the first 300 places, (we would see that as progress in our situation certainly) something is not right. It seems to be fostering an environment that is the opposite of what Google would want, more and more artificial SEO. |
I am sorry but I could not disagree more. I pride myself on being the greatest spammer of google but they have made it really hard for me. The example I quote is a perfectly genuine web site that I can honestly say is the first time in a long long time that I have followed such a course and it is slowly but surely working. Google is big time rewarding new sites with ranking if they follow established google guidelines and penalising others.
Having said that, existing sites can make hay while the sun shines, so I am OK! :)
Pimpernel, you said you broke the top 200 after 6 months of optimizing on a new site...
and you STILL believe there's no sandbox?
Read the posts and you might be enlightened :)
"Sandbox" suggests there is nothing you can do to get out of it and many of the posts here are along the same lines. I am saying, you can get out of it and therefore it is NOT a sandbox. It is just more difficult than it was before.
Pimpernel, after 6 months, it is possible that it was nothing you did that caused your "jump", but simply the 6 month time period that did it (which would imply the sandbox was indeed at play).
No way. We saw a steady increase over time.
But hang on here, I gave an example because someone asked for an example. I have quite a few examples and I have loads of examples of sites that have been "sandboxed" (when I say loads I mean thousands). I also have in the region of 300,000 examples of situations where old sites with new pages perform perfectly well.
You get my drift...
Renee--Thank you, your voice has been sorely lacking.
I believe you are the one that got me thinking along this track a couple of months ago. While I disagree with a few of your points, they are minor in comparison to my agreement with your overall theory. It really is such an elegant solution that it almost has to be true; so many of the symptoms we are seeing and that have been described in the countless sandbox threads are simply accounted for in this theory.
My take on it is that the increase from 4.2B sites in the main to an updated claim of 8,058,044,651 web pages indicates that the index of sandboxed sites was added to the main index, basically a 2nd index as the same size as the main, and that when it got close to it's theoretical limit, they simply tied it to the main and started another sandbox index. However, even with the inclusion of these additional sites now claimed to be in the index, I still see them seperated from the main by a 32-bit capacity issue. Do you think we'll begin to see rolling updates as calculations are made across the whole 8B? That is, will the sites sandboxed prior to this update begin to migrate upwards in the SERPs? So far I have not seen this to be the case, but with rolling updates it's possible it's already begun to happen, just not in any of the industries I pay attention to.
If they have begun a 3rd index (a 2nd sandbox index), will this new one be treated like the original sandbox index? That is, will the original SB and the 2nd SB be treated the same? Or is it possible we'll now see a 2nd level of sandboxing, that google will continue to go to the original SB if not enough returns are found in the original, and then only go to the 2nd SB index if not enough are found in the original SB index? Is there a way we could test this?
Pimpernel--from everything you've said I still think your site is sandboxed. To get to a number 1 position for a 3-word term after all you've done does not to me sound like you are out of it. Which shows two things: 1. You are made of sterner stuff than me, and 2. When this sandbox does end your site is going to dominate and you're going to have more traffic than you ever dreamed of.
renee, I agree that the capacity problem explains most of what we are seeing. I also think there are some serious anti-spam measures being taken at Google. Where one leaves off and the other begins is a very subtle question. They could easily overlap and complement each other.
One minor correction: The Supplemental Index first appeared in August 2003. This was just two months after GoogleGuy said that some engineer at Google fell out of his chair laughing when he was told about the 4-byte docID issue.
This month, when they switched their count to 8 billion overnight, GoogleGuy immediately pointed out, on another forum, that this shows there was no docID problem. He just volunteered this -- no one even brought up the topic in that thread. I fell out of my chair laughing.
neuron, I do not think that Google is in a position to roll the rankings of the three indexes together. Back in 2002, before the famous Cassandra crash of April 2003, Google was on a very consistent monthly update and ranking schedule. To answer a query, all they had to do was this:
--> Pull docIDs from the inverted index for the search terms
--> Apply any real-time ranking algorithms for those pages
With three indexes, to put these all together and rank them fairly as one search result, you would have do three pulls, one from each index, then put the docIDs into one pot, and then apply your real-time algorithms.
It's almost three times the CPU cycles. It would not take three times longer, because the three indexes would be consulted in parallel fashion. But it still means three times the CPU cycles, and three times as many machines need to be involved if you want the same access time.
That's why, it seems to me, that the Supplemental Index is consulted only if a particular query on the main index fails to provide adequate results. I suspect that well over 90 percent of all searches are so unoriginal and mass-mind-oriented (like a search for Britney Spears), that the main index alone is perfectly adequate to handle them.
The sequence of unstable events at Google is worth repeating:
April 2003: Cassandra. GoogleGuy admitted that they threw out an entire monthly crawl and reverted to an index from previous months. The URL-only listings for large sites increased dramatically. This isn't when they first appeared, but this is when they became such a problem for large sites that they had a dramatic effect on traffic. Nothing like this had ever happened before, since I started watching Google at the end of 2000.
June 2003: The 4-byte docID problem is introduced as a theory.
July 2003: Although "Googlebombing" was already a household word, this is when I noticed that it had a very dramatic effect on rankings, particularly from blogs and particularly when the anchor text was non-competitive. It was clearly a weakness in the algo. To put it another way, anchor text was becoming more important than PageRank.
August 2003: Google rolls out the Supplemental Index.
November 2003: Florida happens suddenly. This was clearly a real-time filtering phenomenon, and I don't see how it could have anything to do with capacity issues.
March 2004: People say that this is approximately when the sandbox phenomenon began.
November 2004: The index count doubles to 8 billion overnight, just weeks after the docID theory enjoys a resurgence on various forums, and within days of Microsoft's rollout of their beta.
Today we still have the URL-only problem (as bad as it has ever been). We have the sandbox. There have been no old-style monthly updates since Cassandra. There is increasing evidence that the toolbar PageRank is no longer a predictor of ranking. There is a major "freshness" bonus that kicks in for a time. (It's almost as if the "fresh bot," which was first rolled out by Google in August 2001, was bulked up for big-time duty after Cassandra.) Googlebombing still works quite well, and we still see too much blog noise in the SERPs.
Everything is fragmented, and it's all in the direction of less predictability and less quality in the SERPs. While less predictability in itself this may serve to make life difficult for spammers, by now it's gone way beyond anything that can be construed as purely a set of anti-spam measures.
I also don't believe that Google is trying to make money faster by degrading the SERPs. I have no respect for Larry or Sergey, but I find it hard to believe that they could be that cynical and corrupted this early in their careers.
|This month, when they switched their count to 8 billion overnight, GoogleGuy immediately pointed out, on another forum, that this shows there was no docID problem. He just volunteered this -- no one even brought up the topic in that thread. |
Well actually I had rather cheekliy pointed out a few posts earlier that suddenly being able to expand their index by so much so quickly was probably a good sign that they had finally solved the docID problem. Then GG suggests that it settles the myth - which I laughed at as it looked like another piece of evidence in favour of the problem having existed to me, all about interpretations though I guess.
Any ideas how many pages does MSN beta has indexed? It has larger no. when I search for www. Isn't the capacity problem affecting MSN too?