|Sandboxed Sites - Back Together?|
Do they come out together or one by one?
Most of the new sites that I work with are still in the sandbox. Was just curios to know, if all the sanboxed sites come out of the sandbox during one fine major updation or one by one, over the rolling updates?
That is to say, should one be checking to see if the sites are out of the sandbox regularly or only when they know there is a major Google update? :)
Any ideas how many pages does MSN beta has indexed? It has larger no. when I search for www. Isn't the capacity problem affecting MSN too?
Nice posts renee, Scarecrow - thanks.
Yes, excellent posts. Kudos to both of you, and others.
ScareCrow, best WebmasterWorld post for a couple of months. Thumbs up.
|Everything is fragmented, and it's all in the direction of less predictability and less quality in the SERPs. While less predictability in itself this may serve to make life difficult for spammers, by now it's gone way beyond anything that can be construed as purely a set of anti-spam measures. |
Absolutely on the money!
I cannot argue about search engine technology or the mathematics of this problem but I do have a masters degree in common sense (from the university of life). This qualifies me to state that this is definitely not an anti-spam measure. How anyone can still argue the case for this is beyond me. This is not anti spam, it's anti new content. "New" in any other commercial context is attractive and Google would never deliberately restrict all new sites from featuring. This would be committing commercial suicide.
Also, if this was an effective spam measure Google, as a commercial entity, would be bragging about it everywhere, "Google announces amazing new spam prevention technology.", etc.
Can I refer back to the point I made yesterday in message 188? On a search for a NINE word phrase, a search engine that cannot find a web page with that NINE word phrase as its page title MUST be defective in some way.
BeeDeeDubbleU, mind stickying me that phrase?
Its really hard to believe that they couldn't fix what is happening if they wanted to. No one's better resourced than Google, and lesser engines don't have this same issue.
Such, deliberate or not and to what extent, it might be fair to assume that the current situation suits them. That being the case this could be a very long term thing.
And keep in mind Yahoo search, because its always been lagging behind in sorting its information out. The rate at which they spider and sort, while obviously publically acceptable, is decrepit when compared to Google, even with its problems.
I've given up trying to fight this head on, I just use an older and established domain for important content these days.
Whatever the cause (Google, why do you do this?) - I am also
I had a top 5 ranking for the competitive term "web
design standards" - when my site was on a sub-domain for
a popular web host (aboho).
Since I have bought my new domain, my site no longer ranks
anywhere in the top 1000!
It has been about 5 months now :~/ Googlebot visits very
regularly, crawling my sitemap and newly content-ized pages.
How long must I wait?
|On a search for a NINE word phrase, a search engine that cannot find a web page with that NINE word phrase as its page title MUST be defective in some way. |
I found an example that stunned me yesterday. Using a two-word search, without quotes, I come up with 3,140,000 hits when the words are entered in A B order, and 3,180,000 when they are entered in B A order. So far, so good.
These are household words. I was looking for sites that may have dropped out due to large numbers of sitewide links, which is true of the site in question. Other than that, this site it is a normal, independent e-commerce site. The domain is two years old. These two words might be the most important keywords for the site.
The A B order put this site at number 1 in Google, and the B A order put them out of the top 1000. Remember, no quotes. I repeated the experiment a dozen times, because I couldn't believe my eyes.
On Yahoo the site is number one for both A B and B A.
On MSN it is number one for both A B and B A.
On beta MSN it is number one for both A B and B A.
On Ask Jeeves it is number 4 for A B and number 3 for B A.
I would have to say "Google is broken" from this example. I know that Google's word proximity matching was overwhelmed by their hotshot semantic fiddling a year ago during Florida, but this example is ridiculous.
The other possible explanation is that some of their latest "penalties" are plucked from a grab-bag of very subtle "gotcha" tricks, and that playing games with word proximity is one of them. But it's much easier to believe that Google is broken on this one.
I should start collecting examples like this.
Whether broken or not, I think Google has a management problem. Management at Google is unable to determine priorities and allocate resources in a rational, adult manner.
Anti spam or technology related, I think that summer 2003 was the last time I saw spam free results in my sector.
Despite the fiddling, cleverness and whatever else they have been up to, I've seen more 'shotgun' spam coming and going than I ever did from the birth of Google until last fall.
If it's an anti-spam thing that's deliberate, somebody should be fired, because I've seen it hit good sites, while allowing more spam in than they had previously.
Some terms are squeaky clean though, making me wonder if they hand tweak some of the money keyword results, other terms look much link Inktomi did after spammers learned how to game their algo.
Good sites are being kept out, or knocked down, while some guy with 100-1000 spam domains is driving his new Porsche to the bank, and flipping Google the finger.
Renee, very good post, like neuron I disagree with a small few components of your argument, but they don't affect the overall interpretation you're offering.
It looks to me like pagerank is being applied to the pages in the theorized secondary index, at least judging from a site I'm watching. New pages are given a fast pass entry into the primary index, although from what I can see, large amounts of new pages will be held up, creating the illusion of a sandbox type affect being applied to them, but it's far more reasonable to assume ongoing capacity issues, and a waiting line for big chunks of new pages on old domains. I saw this first hand recently, there was no doubt that the lag of entering new blocks of pages was not a sandbox, but more of a matter of needing to wait for room to appear.
To the people who say the sandbox doesn't exist because they've managed to crack the algo and get in, that doesn't prove it doesn't exist, it proves that the algo is crackable. All algos are crackable, ask any cracker about that, it's just a question of how hard it is to crack. If you name this particular component of the algo 'sandbox', then state that you've found out how to crack it, this doesn't demonstrate its non-existence, it demonstrates that it's crackable. Everything is crackable.
Good posts scarecrow, and many others in this thread, internetheaven for pointing out that the algo can be cracked, did you pay for that information? My guess is yes.
The main question is "how do you crack this sandbox algo"?
you've mentioned that you have gotten a new site below 100 for 2 word competitive terms and number 1 slot for three word terms.
Could you be a little more specific about the 10 times more SEO work that was required to accomplish this amazing feat?
Specifically, how many backlinks and what type? Do you have a PR 9 site at your disposal or some other advantage that is not available to most? Are you a DMOZ editor?
Data point: since being "sandboxed", our previously most commonly searched 2 and 3 word terms fell from the top 5 or 10 in the SERPS, to 200 to 1000 or worse. BUT, for a less common 2 word phrase, we are frequently #1-#3. Virtually all of our traffic is now from the latter, and total G traffic is down about 75%.
We got there by putting one member of staff on the job full-time for the last 6 months just dealing with about 10 sites, putting lots of content on, lots of links, useful resources and spending a huge amount of time contacting web sites and asking for links. No great links, apart from yahoo directory and dmoz, but lots of pretty good ones
Renee great post. Is it possible that this “third index” could be a completely new search engine, based on word cluster technology? And could they be buying time using their existing index until this new engine is complete?
In October at the web 2.0 convention Peter Norvig, director of search quality at Google revealed that they are working on three different tasks to better understand the web. Statistical machine translation, named entities and word clusters. [searchenginelowdown.com...]
“From the demonstration made by Norvig, it is clear that Google is working on ways of improving and innovating its search technology for the future. While start-up Vivisimo may have clustering technology on the market, with the launch of Clusty, there is no doubt that Google has the resources and R&D to ensure that clustering technology is the "PageRank" of the future.”
"[We're] trying to go just beyond keywords and the linking structure of the Web, the innovation that we brought to search, and get behind the deeper meaning," Norvig said during his presentation.
"We want to be able to search and find these [entities] and the relationships between them, rather than you typing in the words specifically," Norvig said.
I hope they come out with something "big" soon, because if they continue with the poor quality results they are now serving, MSN will blow them away.
|I hope they come out with something "big" soon, because if they continue with the poor quality results they are now serving, MSN will blow them away. |
For some reason everyone assumes that MSN is better, I'm sorry I just don't see that. Do a search on "money." Not sure that Eddie Money should be up so high at number 12... Of course MSN Money is #1
Sticking with that area on "finance", Yahoo holds the top 6 spots, not sure that is a good thing.
Does this forum think that this type of Google bashing is productive?
I put up a new website at the end of May, here are the facts as they apply to this website. One tweak I made was completely changing the mod_rewrite structure in August, I knew I would take a hit for that. But here are my finding:
Google - has all 700 pages of content in its index that are more than 7 days old, it visits daily to grab pages. I get traffic on non competitive phrases.
Ask - has around 140 pages, all dating back to pre-August (I can tell by the URL). I get traffic from more competitive phrases than Google. It only spiders about 150 pages a month.
Yahoo - has around 3 - 6 pages, all dating back to pre-August. Has slowly removed old pages from a high of around 200. Slurp grabs pages daily - lots of them. Spiders the site at twice the rate of googlebot. I wonder where the new pages are since it's been 3 months. The new pages actually show (around 200) on the Yahoo Beta search site. Which everyone here seems to be ignoring so that tells me what everyone thinks of Yahoo and its search engine.
MSN - Shows exactly zero pages in its index. The MSN beta shows around 150 pages, although it is even more active than Slurp. I rank extremely well in the new MSN search on competitive phrases.
I believe there is something holding this site back in google - sandbox, whatever. But I can wait it out. That being said, I still get more traffic from google than everyone else combined.
The site is pure content, wrapped by Adsense. It took less than 4 weeks to get a DMOZ listing in late June. I also run Adwords to the site. So for those that think those things factor into the sandbox... GUESS again.
BillyS, I don't think anyone here believes that those things you mention have anything with the sandbox. If you think that people here are just Google-bashing, then by comparison your post is just WW member-bashing.
People are mainly complaining about (or concerned with) the staleness of Google results and the inability to get new sites into the SERPS. By the number of people reporting similar observations, it does appear to be a significant problem. If the sandbox keeps out new sites, then yes, it affects the quality of Google's results. And if the sandbox is a technical problem and not a deliberate anti-spam measure, then Google could be having some very serious problems, which could enable the competition to catch up to (or surpass) them.
We're not bashing Google we're discussing Google. Opinions do come up, however, whether valid or not. I personally don't think Google is doing this to promote the usage of Adwords, but it is the only reason I use the program. Has it affected your use of Adwords?
This is either a major problem with Google or a major problem for most of us, or both. In my opinion Google is still far and away the best se out there. But the changes the others would have to make to close the gap are so minimal that it could happen virtually (no pun intended) overnight. For instance Y! could change their results so the same domain couldn't show up more than twice on one page and become much less spammy in an instant.
Additionally, we don't owe Google our allegiance any more than they owe us our living.
Sorry I didn't get back to you sooner...
>Do you think all new external links are sandboxed or just some new external links?
I think all. Google wants to stop new or old sites hitting the top serp positions quickly via bought links and also wants to have time to examine new sites before it allocates full pr from the link. The sandbox buys them time to run other spiders/algos through the site. They probably have decided that new sites never deserve top positions in competitive searches until they have proven themselves, so they sandbox them. They want a period of time to see a number of links slowly being attributed to a new site and from different types of sites and ip's. Once this 'natural' linking pattern has been seen, they take the site seriously. An established site linking to another established site is subject to the same 'sandbox' with the full effect of the link phasing in over time.
>Do you think this could relate to topic sensitive page rank calculations (just a thought)?
This is where hilltop kicks in, which is where I believe the confusion over sandbox is. There is two new tactics at play. Once a new site has been around for a month or two, it may have passed all the tests via sandbox. It may have 'natural' links in which conforms to a profile that google believes shows a genuine site, like varied links in from directories, links pages, 'authority sites' etc. and the site is seen to have fresh content, original content and no spammy tactics. It then can rank high IF it qualifies according to hilltop. A search takes place for 'widgets' and 10,000 pages appear in the results. These by default have the theme 'widgets'. Only links from these sites in the results count, unless the search term is not competitive so that hilltop cannot be applied. Sites that have links from within these results coming from a nice mix of 'hub sites' and 'authority sites' will rank well. It is very difficult to actively seek and acquire these links via link exchanges. Firstly, exchanging links in itself is probably of little benefit if they are recipricol. Google is placing more importance on one way links which suggests a 'true recommendation'. Secondly, trying to get the mix of different ip's linking to you from different styles of pages and with varied on theme anchor text plus 'broad match' words plus links from pages with the relevant title etc. etc. is a tall order. Add to that the fact that the pages linking to you will also need relevant links in to make them a 'hub' or 'authority' and it all gets very complicated. The only effective way to achieve this is to have good content and let it happen naturally.... which takes time.
I guess that 90% of webmasters here are launching sites with link exchanges and links from their own other sites that just do not help in the 'hilltop' scenario. Older sites have acquired the 'natural' link structure required, hence they are usually doing OK. The older sites that have dropped never pulled in these links over time.
To sum up, if your site has been around for a few months then you are probably not in sandbox. New links to your site are constantly being phased in over time, whether you are an old or new site. Untill you acquire the full value for links from sites appearing in the same search results as yourself, then you will never rank well. The sites that link to you need to have a good 'profile' as well, be it high pr or deemed a 'hub' or 'authority' for the search phrase in question.
The key to ranking well is now time and quality. Time to acquire the full effects of new links in and quality of content to attract links that you can never manufacture or fake. Internal linking is not subject to this time delay thus new pages on an established site rank well and quickly because the linking page has the status to qualify within hilltop and immediately pass full benefit to the new page if it is also relevant for the search term. Internal linking is treated in a very different way to external links, so recipricol links and ip is not an issue.
"An established site linking to another established site is subject to the same 'sandbox' "
then sacrifice one of your established sites:( in favor of another that is more importent for you:)
by the way my 6 months brand new domain and page though it has very few IBL's is ranking well for very competitive terms at the top 100.That makes someone to believe that you dont need tones of links.I have noticed dramatic changes in SERPS for 2 KW's for (hotels) (ie.. widget hotels),pages with tones of links went down the drain....ho.ho.ho Mary XMAS
|To sum up, if your site has been around for a few months then you are probably not in sandbox. |
Mhes, in the light of all the evidence, albeit circumstancial, to the contrary?
Web design is (thankfully!) not my only occupation. In actual fact it is more of a pre-occupation. But I have been building small business websites now for about three years so I have developed some knowledge of what to expect with regard to ranking.
None of the sites I have built since February this year have developed any Google traffic, bearing in mind that most of my clients are generally not looking for high traffic, having mainly on-line brochure site. I sometimes tweak these sites for nothing and the client's often don't even know I have been doing it. This is because I like to see them gaining some sort of ranking and I can say quite categorically that this is no longer happening. All of the sites I created before this time continue to do well.
I will stand by what I said in an earlier post.
|This is not anti spam, it's anti new content. "New" in any other commercial context is attractive and Google would never deliberately restrict all new sites from featuring. This would be committing commercial suicide. Also, if this was an effective spam measure Google, as a commercial entity, would be bragging about it everywhere, "Google announces amazing new spam prevention technology.", etc. |
>This is not anti spam, it's anti new content.
No, this is not correct. New content ranks well especially on a 'news site' where the spider visits often and gives new content a temporary boost.
You have to look at this from googles perspective. For 'widgets' they will have thousands of new content every day. They cannot put all this within the top serps. So the decision is easy, established and relevant sites which they know are OK get preference. These are sites with good links in. New sites have to earn that status and it takes time.
>....All of the sites I created before this time continue to do well.
Same experience here and our new sites are nowhere. The reason is simple, older sites have long established links in that pass full pr etc. The linking sites themselves will have been around a long time. They continue to get new links in without a webmaster asking for them and thus are continually ahead of the game. A new site has a long way to catch up, both in having their links fully counted and acquiring 'natural' new links. They will catch up but it will take a long time.
The google link: search is useless, an old site will have considerable links that never show and from other old sites... this is a big advantage.
You made some good points MHes.
Short version to get out of sandbox:
Look at top 100 SERPS and get some of those pages to link to you.
Can anybody confirm that this work?
What is yor take on 301'd links?
Powdork - Don'r know, I have no experience of them :(
|New content ranks well especially on a 'news site' where the spider visits often and gives new content a temporary boost. |
OK! I meant new content on new sites but I think you know what I meant ;)
|You have to look at this from googles perspective. For 'widgets' they will have thousands of new content every day. They cannot put all this within the top serps. So the decision is easy, established and relevant sites which they know are OK get preference. |
But this is just not happening. Sure, for a few searches the results are OK but the Googlebot still gorges itself on spammy sites. If Google "knows" (or thinks) that some of the established sites that I am seeing at the top of the results are "OK" then Google is doomed. But then Google may be be hoist by its own petard anyway. I mean its own Adsense scheme, which is the biggest single factor in the explosion of spam on the Internet.
Mhes, AFAIK, the only truly important inbound link is an unsolicited inbound link. If all new sites are sandboxed, how is an authority site ever going to find them, much less link to them? The only new sites that get good links will be owned or managed by people like us.
How many regular people with regular web sites have active link development programs? Do a search for "submit url" + widgets, and you will find sites from the spammy side of the tracks soliciting reciprocal links. If Google forces people to resort to solicit a specific type and quantity of links just to get their sites activated, it is encouraging SEO tactics. And, for this dubious accomplishment, it essentially sacrifices a large amount of new content. It just doesn't compute.