| This 98 message thread spans 4 pages: < < 98 ( 1  3 4 ) > > || |
|Does the "sandbox" Only Affect Phrases Containing Popular Words?|
If the phrase has no words over 70-80 million results, does sandbox apply?
| 6:35 pm on Mar 10, 2005 (gmt 0)|
While discussing [webmasterworld.com] a most interesting analysis of Google's number of results [aixtal.blogspot.com] figures I speculated that the Google might use a smaller index for popular words, in a manner similar to that explained in a pre-Google Backrub paper.
Liane took this idea further, and suggested that this might explain the sandbox.
So without getting into specifics, what is the view on sandbox applying to phrases that have no words with less than 80 million results?
Keep in mind that many phrases with few results contain at least one word with more than 80 million.
"that have no words with less than 80 million" should be "that have no words with more than 80 million". Thanks Liane for spotting the error.
[edited by: ciml at 3:33 pm (utc) on Mar. 12, 2005]
| 6:15 pm on Mar 12, 2005 (gmt 0)|
Thanks Liane, "less" and "more" were wrong in the first post.
BillyS, the link: operator was broken because back when it used to list all the links (and even in PageRank order) it was too easy for web marketers.
> still think it's a measure of the competitiveness of a term
I guess I still crave a reason for sandbox as a byproduct; the idea that it would be there on purpose is just too disturbing. :-)
> Do searches for people's names. You need nowhere near 80 million results (for either word individually) to get frozen.
> plenty of evidence based on searches starting with "two word city" neither word1 nor word2 or word1 word2 have more than 56million results
Well I think that's the answer I was looking for, one way of the other.
| 6:47 pm on Mar 12, 2005 (gmt 0)|
IMHO there is something targetting/filtering certain words, whether it is related to the sandbox (if it exists) I will leave for others (more knowledgeable) to decide.
we have a page that is geared towards 2 terms (2 word terms), the one word is a LOT more competitive than the other, but they both begin with the same word.
(for example, buy widgets and buy woogles)
(pretend that woogle is another word for widget)
until Feb 2nd we were number 1 for the VERY competitive term (widgets) and had been in the top 3 for years (literally).
we were also ranking well (normally top 3) for the LESS competitive term.
without going into specifics the page and IBLS are targetted towards the VERY competitive term.
Now we are 81st for the competitive term and 1st for the lesser term.
bear in mind this is the SAME page but has *apparently* been penalised (lots of crap above it) for the more competitive term ONLY
in other words widgets penalised/dampened and woogles not
not sure if that makes sense or helps :o)
i know what i mean ;o)
p.s once i made a word up for an example and it turned out to be a very BAD word in another language, AFAIK Woogle is fine ;o)
<edit>spelling, grammar, you name it!</edit>
| 7:17 pm on Mar 12, 2005 (gmt 0)|
Well, FWIW, I have a different point of view from most.
Question: Does it really make sense that G would keep a list of queries or kw occurances and target them? The practical and logic problems with this notion are almost too many to mention. It doesn't make sense.
And how would a list developed according to volume of searches or volume of occurances effectively target the categories most associated with spam? It couldn't. (Or rather, there would be no correlation with the kw's on such a list, and the kw categories most associated with spam.) Let's not forget that he majority of searches are not commercial in nature. There are too many easy examples to show that the sorts of sites being sandboxed cannot be defined according to classes, or kinds, of kw's.
My opinion: The sandbox is not about frequency of instances, or frequency of occurance: It as about spotting what appears to be artificiality, in order to fight spam. This includes, but is not limited to:
- artifically high rates of growth for new sites (which must achieve different standards than older sites...which paradoxically in this case, equates to restraint, not aggressiveness);
- artificial patterns of kw occurance;
- and yes, there is an age factor.
The more "competitive" cat's involve more competitive kw's, and require more extreme measures to get sites to rank early on. So, it's easy to believe that this has something to do with sandboxing. It does.
But the specific kw's have little to do with it. It is not the type of kw. It is the way that ANY kw is treated, that causes 'sandboxing.'
Those who blast out pages with hot kw's by the gazillions 'feel' it as related to their kw's. It is not. It is related to their actions that involve those kw's. This, IMHO, is why 'seeing' the sandbox has been so difficult.
Ever notice those posts about sites coming out of the sandbox, and how often this is associated with some change in the structure of backlinks and/or backlink text?
There is no spoon. There is no sandbox. There is no kw list. There is only the algorithm and its related filters. KW's are very important, but is is not what the kw's are, it is how they are managed.
| 7:49 pm on Mar 12, 2005 (gmt 0)|
"Does it really make sense that G would keep a list of queries or kw occurances and target them? The practical and logic problems with this notion are almost too many to mention."
Not to be a contrarian caveman, but to me I'd change this question:
Does it really make sense that G would NOT keep a list...
Can you explain why the practical and logic problems etc strike you as that complex? I certainly hope google keeps track of its searches etc, wasn't this is a sense what hilltop was all about, selecting a targetted group of keywords, running the algo on those, then making results based on that precalculated result set? Which when discussed here was not generally seen in any way as too complex or unfeasable.
What would make no sense at all would be if Google did not track keywords, target keywords, etc, since if they didn't, it would make their adwords program pretty weak. Plus it's just the kind of information that is highly relevant to a search company.
I suspect the reality of the matter floats somewhere right between you and billys.
Re no sandbox, don't buy it, I put a site into the sandbox, it kept performing on obscure keywords, one result group would give top 10, one not, same number of serps, just one keyword group was more competitive than the other. This was a rebranded site. When the site came out of the sandbox in alegra, and it definitely came out, over about a week, it returned to almost the exact place in terms of traffic it would have been in, based on rates of growth of old domain, if it had never entered the sandbox, in other words, if you'd graphed a line from the 2 months before it was rebranded, ignored its time in the sandbox, the line would be basically straight in terms of growth and traffic. In other words, definitely a filter on it, had nothing to do with optimizations links etc from what I can see, it was just a new domain, that's all.
| 3:04 am on Mar 14, 2005 (gmt 0)|
Ciml, Care to update us on what you've been mulling over?
| 3:43 am on Mar 14, 2005 (gmt 0)|
Yes, ciml, I"m curious too. :-)
2by4...some thoughtful posts. I'll try to answer some of your questions, since you directed them to me. Please keep in mind that these are just my opinions. Of course, I know that I am right...but that's an opinion too. :-)
|Can you explain why the practical and logic problems etc strike you as that complex? |
To be clear, I did not say that the practical and logic problems were "complex." I said that they were "too many to mention."
|Not to be a contrarian ... but ... does it really make sense that G would NOT keep a list... |
Again, that's not precisely what I said. What I said was: "Does it really make sense that G would keep a list of queries or kw occurances and target them? I.e., "...and target them" meaning, target them for banishment into the sandbox, since targeting words or phrases for sandboxing is the topic of this thread.
G probably has so many lists that they need a list to keep track of their lists. I salivate when I think of how much info they must have on kw's, related kw's, etc. ... (And if you check out this thread about Allegra [webmasterworld.com] that I started in Supporters, you will see that I belive LSI, Hilltop and LocalRank are all likely playing a role in the current algo.)
But that is not at issue here. The well defined topic of this thread is: Does the "Sandbox" Only Affect Phrases Containing Popular Words? (70 or 80 million results)?
I think we've collectively decided here that the answer to that question is 'no.' But this question is very related to a long line of questions concerning what kinds of kw's trigger the sandbox?
So, does G keep some sort of hit "list" of keywords or phrases that trigger 'sandboxing'?
Let's look back at just some of the theories: • G targets "competitive" phrases and/or KW's (defined as search term frequency or search term occurance). • G targets "money" phrases and/or KW's (defined as search terms above a certain bid amount or gross revenue amount). • G targets "spammy" phrases and/or KW's (defined as search terms in categories with above average amounts of site spam, e.g., local hotel searches).
...and so on.
Now, the more you look at all the theories about kw lists that might be targeted by G for sandboxing, the less sense it makes. How could there possibly be lists of kw's that accurately and fairly determine what categories are "spammy" and what categories are not? This is the notion I find unworkable. • They can't do it with kw search frequency or on-page occurance: There are tons of kw's that are not "competitive" and/or are not "money" words, and are not particularly spammy (see steveb's example above). • They can't do it with Adwords data: If that were ever uncovered, G's organic search product would lose all credibility. • They can't do it with "spammy categories": Or rather, they can't do it until they decide to hire a million people to sit down in a room, assign each person to a category, and then get each one to decide if their category is spammy, or not. And even then, a week later, it could all change. (Those spammers don't waste time when it comes to uncovering new opportunities.)
So how does G develop the list that decides what sites and/or categories and/or keywords and/or keyphrases get targeted for sanboxing and what sites do not? My answer: There is no list. There is no sandbox. There is only the algo and its associated filters.
An interesting point: This line of questioning (i.e., does G have a blacklist of kw's/categories?) dates back not to the early days of the 'sandbox' (spring of '04), but to the Florida Update (fall of '03).
IMHO: What happened between the Florida/Austin Updates (Nov. '03 - Jan. '04) and 'sandboxing' (March/April '04) is not that some silly 'sandbox' was created, but rather, that the nastiest elements of the Florida and Austin Updates got nastier still. But what is so interesting is that from that point on, the algo for some odd reason became two entities in some peoples' minds:
1) "the algo," and,
2) "the sandbox."
(I think this is because one or more age related filters were dialed way to one extreme, and the resulting 'cut-off' period made it seem like some new "other" thing was implemented, along side the algo.)
In any event, I think people would be better off if they thought of March/April '04 not as the start of two entities...but as the evolution of the Florida/Austin update into another entity, called the "frustration" algo.
Recently, the Allegra algo seemed to pull back on some of the filters that got most closely associated with sandboxing. Yet still, oddly, while many webmasters have noted a connection between Allegra and the sandbox, they state it in terms of 'my site was just let out of the sandbox in the Allegra Update.' More accurately, IMO, G loosened up some of the filters that came to be associated with a vaporous entity <sandbox> that never existed in the first place. There was no sandbox to be let out of; they just loosened some filters associated with newer sites ... filters that were tougher, pre-Allegra.
There is no sandbox, there is only the algo and its associated filters.
|Re no sandbox, don't buy it, I put a site into the sandbox, it kept performing on obscure keywords, one result group would give top 10, one not, same number of serps, just one keyword group was more competitive than the other. |
2by4, I grab this quote not to pick on you at all, but because this is the essential line of reasoning that I have heard so many times before ... and that I believe continues to get so many into trouble. Think about it. Do most webmasters treat thier more competitive kw's the same way that they treat the kw's that seem to skirt the sandbox? Of course not. In all likelihood, the reason you went into the sandbox with that rebranding change is simply that the age filter kicked in when you rebranded, and the site did not pass the criteria for newer sites (which is what it became in the algo's eyes when you rebranded). The site popped out with Allegra because it was on the cusp of passing anyway, and the loosening of the filters in Allegra set it free. We've seen it a lot.
The specific kw's have little to do with it. It is not the type of kw. It is the way that ANY kw is treated, that causes 'sandboxing.' ;-)
| 10:42 am on Mar 14, 2005 (gmt 0)|
Wouldn't it be fairly easy for google to determine whether a site deserves "sandbox" based on adwords data ? I'm sure they gather it profoundly..
| 2:09 pm on Mar 14, 2005 (gmt 0)|
> Now, the more you look at all the theories about kw lists that might be targeted by G for sandboxing, the less sense it makes.
There, we can agree.
> My answer: There is no list. There is no sandbox. There is only the algo and its associated filters.
The absence of a list doesn't by itself remove the "sandbox" idea.
People have found a long 'lag' from when a new domain receives its links, to those links counting. And yet, a similar page added to an established domain seems not to suffer from this 'lag'.
Many people have assumed that sandbox is a penalty, but I do not believe that we know this for sure.
The main non-penalty idea promoted to explain sandbox was that Google ran out of room at 2^32 pages, roughly 4.3 billion. This seems quite strange to me, as Google ought to have the in-house ability to deal with large numbers if they wanted. If I recall correctly, this idea had also been denied specifically by at least one Google employee (more, I think).
Another idea has been that Google are using some advanced kind of link analysis that takes a long time to calculate. We've read works by Jon Kleinberg, and by Krishna Bharat and George Mihaila. The aspect of this idea that I tend to dislike, is that Google would assign link 'theme' by domain and not by page. This would seem odd, although a number of people believe that pages on a well linked domain get an advantage, completely independent from PageRank. It would be fun to see if those two ideas could be combined.
caveman, shri, my mulling is simply that Liane's aside seems quite intriguing.
What if the Mr Véronis' idea of an index of two parts, was in part based on domains and took a long time to calculate?
I'm not pushing that anyone should believe this, nor even that the idea is at present the basis of a theory. I believe that this, like other ideas deserved some inspection.
steveb and shri both have examples of domains, seemingly sandboxed for searches that contain no words returning close to 80 million. That makes the idea seem less likely.
| 3:44 pm on Mar 14, 2005 (gmt 0)|
More research, about our search term in other search engines and google SERBS and found out there may be rules or filters that google applies to specifically exclude certain sites or search terms from the primary index. Our traffic from google has died 1600% compared to yahoo, also that google assists other search engines with SERBS and we are also lost in these SERBS like aol and others.....
Search Engines that list us in there SERBS
1 - 10 of 630,000 Results for(widget)employment -alltheweb
1-10 results out of 137,300 for(widget)employment -web.ask
Web results for "(widget)employment " (1 - 20 of 51)-dogpile
Results 1 to 10 of about 35,653,313 for (widget)employment -gigablast
(widget)employment 72 unique top-ten pages selected from at least 133,000,000 matching results -Ixquick
Search Results 1-15 of 47 for (widget)employment - mamma
1-10 of 187,356 containing (widget)employment (0.09 seconds)search.-msn
Results 1 - 10 of about 680,000 for(widget)employment - 0.10 sec -yahoo
We have also a blog site that we launched in September 2004 (widget)blog 1350-2575 links depending what data center is displaying the SERBS that contains the word (widget) and it is also not listed in google SERBS any more, but listed in all other search engines.
| 4:03 pm on Mar 14, 2005 (gmt 0)|
|It is the way that ANY kw is treated, that causes 'sandboxing.' |
I don't think this is the case.
If we approach this from the angle a page can be 'sandboxed' for one word but not another. (rather than a site that doesn't rank for anything)
I have a site lets call it 'widgetsandwodgets.com' I have links to it mainly of 'widgets and wodgets' some others of 'widgets and wodgets - kaplinks kaplonks'
I rank well for 'keyword wodgets' 'kaplinks keyword' and 'keyword kaplonks' but nowhere for 'keyword widgets'. Widgets is the big money high competitive term. I am nowhere really nowhere, never get a visit from from google with widget or widgets.
I have many pages on the site targetting each keyword widget wodget kaplink and kaplonk using the exact same structure for each keyword and have general pages targeting all the keywords.
So if my structure and links are spammy for widgets why do I rank well for wodgets, kaplonks and kaplinks.
(agreeing with diddlydazz here)
[edited by: grail at 4:04 pm (utc) on Mar. 14, 2005]
| 4:03 pm on Mar 14, 2005 (gmt 0)|
|People have found a long 'lag' from when a new domain receives its links, to those links counting. And yet, a similar page added to an established domain seems not to suffer from this 'lag'. |
This is my challenge to the comments about the link: command being broken. I was wondering about the timing of the link: problem and the sandbox. I found a post on another forum from GG stating the they were going to change the output in the June '04 timeframe.
I was "lucky" enough to have my sandboxed website make it into DMOZ in the first 6 weeks of its life. I also advertised using text links outside of the Adwords system. Even though I did not chase link building myself, I managed to get several thousand links (based on other search engine reporting) in about three months time.
It appears that Google is only showing a representative percentage of links to the user. My site had been showing 70 - 80 links through several link updates up until the last update when the number went to 120. A number that I think does not represent the site well at all.
Personally, I think the sandbox is related to IBL. My site is all white hat, contains lots of content (700+ pages at 500+ words per page), yet gets virtually zero referals from Google (less than Ask, which only has about 30 pages indexed). I have followed Brett's 26 steps except the link building part - those came quite quickly.
Has this experiment been done elsewhere? Has anyone plotted the number of IBL in another engine versus google versus the age of the website based on the first time Google indexed a page?
If this makes sense, I will put the information together... I think I could do it with publically available information...
| 6:59 pm on Mar 14, 2005 (gmt 0)|
|The absence of a list doesn't by itself remove the "sandbox" idea. People have found a long 'lag' from when a new domain receives its links, to those links counting. And yet, a similar page added to an established domain seems not to suffer from this 'lag'. |
In my mind, the absense of a keyword list (which was the kind of list we're discussing) does imply something: It implies that:
1) the so-called sandbox is not related to specific kw's (which would by definition have to be kept in a list); and,
2) that the so-called sandbox is algorithmic (which is how G likes to handle everything). Note: When I say 'algorithmic, I include the use of filters.
I wish to clarify something. I do believe that there is a list related to 'sandboxing.' Though perhaps it is better thought of as a 'category.' The category is: Sites launched after March '04. These sites are subject to a different combination of algo/filters than sites launched prior to March '04. Being in this category, as I think most would agree, it what potentially subjects sites to the algo that is associated with 'sandboxing.'
The fact that the vast majority of sites could not get past this algo is further evidence that there was no kw list, but rather, that sandboxing had almost entirely to do with launch date of the site ... and the algo/filter set applied to those newer sites.
|Many people have assumed that sandbox is a penalty, but I do not believe that we know this for sure. |
Persoanlly, it seems doubtful that G would 'penalize' all or most sites based on launch date. Nah, it's algorithmic. ;-)
|The main non-penalty idea promoted to explain sandbox was that Google ran out of room... |
Yes. And the other one often floated, which I subscribe to (with only a modest level of conviction), is that it was a rather ill-conceived attempt to stem spam, or rather, to discourage spammers. But it's not a good idea in this thread to rehash the debate in this thread. And frankly I don't care much. Either way, the potential for being caught by G's 'lag' phenomenon existed for sites launched after March '04, and so we spent most of our time not on 'why it exists' but on 'how to get around it'.
Another idea has been that Google are using some advanced kind of link analysis that takes a long time to calculate. We've read works by Jon Kleinberg, and by Krishna Bharat and George Mihaila. The aspect of this idea that I tend to dislike, is that Google would assign link 'theme' by domain and not by page. This would seem odd, although a number of people believe that pages on a well linked domain get an advantage, completely independent from PageRank.
Personally, I think the sandbox is related to IBL.
Count me among the group who sees it this way, based not on guesswork, but on experience with our own sites, and assessment of other sites we know of that never had much trouble with the lag.
Early in the sandboxing period I stated in several threads that it was possible to get past the 'sandbox'. Reactions initially ranged from disbelief to aggressive rudeness. Over time though, more and more webmasters began admitting that they were finding cracks in the algo.
There are to my knowledge at least three well known ways that sites were getting past the lag, unscathed:
---> One way had to do with 'tricking' the algo with a workaround that bypassed the age-related filters (i.e., the sites were not really 'new'). This method is pertinent to this thread only insofar as it further supports the notion that newer sites were subject to a tougher algo.
---> The second of the two more frequently discussed methods was practiced mainly by blogspammers. It involved links.
---> The third way, which I and others have alluded to, but has not been widely discussed publically to my knowledge, also involves links, but unlike the blogspam method, this third way involves naturally developed links. Not to fast, not too slow, not too many, not too few. And there are other important elements. ;-)
Point being, it has a lot to do with links. And kw's. And avoiding footprints or signs of artificiality.
I have a site lets call it 'widgetsandwodgets.com' I have links to it mainly of 'widgets and wodgets' some others of 'widgets and wodgets - kaplinks kaplonks' ... I rank well for 'keyword wodgets' 'kaplinks keyword' and 'keyword kaplonks' but nowhere for 'keyword widgets'.
grail's problem was, in fact, a problem that began with Flordia - not with sandboxing. It is even more of a problem since March '04. More evidence that Florida/Austin and the 'lag' are the same, only that the 'lag' is far more severe. ;-)
|What if the Mr Véronis' idea of an index of two parts, was in part based on domains and took a long time to calculate? |
My personal opinion: This notion of crossing result sets for individual kw's to reach conclusions about kw combinations is brilliant. IF it's all about kw's (not specific ones, but how any kw is treated), and about links, the one issue I was struggling with was: How specifically do the filters handle the problem of kw phrases? Now I'm wondering: Perhaps by applying the criteria to each kw, and then crossing the results such that all kw's in the query must pass the test. That would clarify what was still fuzzy.
As for the idea that sites benefit from certain kinds of linking structures (internal and IBL's), we have believed this with relative certainty since shortly after Florida. In fact, it now drives almost everything we do, from external to internal linking. I could be misremembering, but I believe that it was Martinibuster who made some critical observations about internal linking structures, shortly after Allegra hit.
There is no sandbox. There is only the algo (+ variants) that affect pre-March-'04 sites, and the algo (+ variants) that affect post-March-'04 sites. The latter algo is much harder to crack (but it is easier to crack after Allegra than it was before Allegra).
| 8:19 pm on Mar 14, 2005 (gmt 0)|
Caveman, thanks for the in depth response, too much to cover right now, food for thought. I considered the 'sandbox' as a function of the algo, given that it applies to new domains, seems like there would have had to be the addition of a 'new domain' type flag that would cover sites, not just pages.
Re: ut that is not at issue here. The well defined topic of this thread is: Does the "Sandbox" Only Affect Phrases Containing Popular Words?
I have to admit, I read the linked to item, and the argument was to me so weak it wasn't really worth talking that much more about. Plus I could see with a single counter example that this didn't appear to be correct. Say: kw1 kw2 kw3 versus kw1 kw4 kw3, both containing common keywords, one was 'sandboxed', or 'algoed' if you prefer that term, the other was not. Didn't read it that close, but really seemed like the author went a bit overboard on his assumptions, there could have been much simpler explanations.
| 8:48 pm on Mar 14, 2005 (gmt 0)|
I think human behavior plays a big role in this.
Some keywords are more popular and are more targeted therefore more suseptible to abuse.
These types of words would naturally trip the filter or whatever more often.
| 10:47 pm on Mar 14, 2005 (gmt 0)|
|Persoanlly, it seems doubtful that G would 'penalize' all or most sites based on launch date. Nah, it's algorithmic. ;-) |
I still believe that this is related to launch date. If it were algorithmic would it not be applied to all sites as opposed to just those launched after Feb 04?
Also, if it were algorithmic it would not affect all (or almost all) new sites. I have launched a couple of non-commercial sites that appear to be well and truly sandboxed. As far as I know many other similar sites are in the same boat.
| 12:49 am on Mar 15, 2005 (gmt 0)|
I'd put forward that the sandbox is directly related to the Google's bottom line.
Companies exist to make money, not primarily to provide a useful good or service, though the two must travel closely together out of necessity in the long run, still they do not need to perfectly overlap. I think Google is a company with ethics better than most, but it is still a company and a vulnerable one at that as it’s in competition with two more diversified organizations in Yahoo and MSN.
Correct me if I'm wrong, but back in 2003, new sites initially got “a new site boost” in the SERPS, then dropped before struggling back to the top after getting a few high PR links or many moderate links. This practice seems to have been replaced by the sandbox (or I should say whatever we call the sudden disappearance of sites competing for high revenue keywords in the SERPs)
If you consider both from the perspective of generating revenue, it's not hard to see how each can be to Google's advantage, not in the fight against Spam, but to enhance revenue. In the case of the former, a boost gives sites an initial taste of revenue (let’s call it the schoolyard pusher technique), and then offers up adwords for the six months or so needed to build one's site back up. The sandboxing of sites in highly competitive keywords would have the same effect, but probably more so. People will go running to adwords. These happenstances generate BILLIONS for Google. As a profit maximizing entity, Google just has to cut the optimization curve at the point where SERPS remains good enough so broad public perception remains unaffected and so they don’t lose market share while dropping some sites to increase pay-per-click revenue at the optimization point. That’s why you’ll never see authority sites like Amazon go down, that would hurt Google's image, but pick primarily on rotating waves of second and third tier sites – not enough to damage the SERPS but enough to keep a steady flow of adword revenue and it's pure profit. Who cares if there are isolated little islands like WW full of webmasters complaining.
Any economist would point of that manipulative behavior is the rule rather than the exception for unregulated businesses in high market concentration situations.
Furthermore, unlike the highly diversified Yahoo and MSN, Google's primarily source of is it's click though advertising from search. On the anecdotal side, I've heard from several sources that working in the Google advertising section is hell on earth and they are under immense pressure to bring in revenue. Not to mention that, S and L in their Playboy interview were nakedly pushing adwords.
I’m not saying that lots of other things aren’t happening concurrently, but I think it’s a mistake to view Google SERPS in a scientific vacuum. Maybe if Google were run by engineers at a university or a government think-tank this could be the case, but not when it’s run by a CEO chosen by its venture capitalist shareholders and now with the increased pressure of being a publicly traded company. In fact, when it comes to major identifiable shifts, I think one should consider the effect on shareholder value before even the SERPs.
| 2:15 am on Mar 15, 2005 (gmt 0)|
Certain keywords seem important; especially, perhaps, keywords targeted by fair nos. of money-oriented sites [two-word name that was sandboxing for me delivers under 40m results - each word under 60m, so I'm not sure it's sheer results quantity that's important]. Timing important: was site created after around Feb 04? - answer yes and sandox evidently more likely; even if not, big site changes since then might trigger sandboxing (how big?; I dunno, just read of this in forums).
I had info-focused site emerge with Allegra, suddenly doing well for keywords that previously knocked it into outer cyberspace; too bad that seems other informative sites maybe still snared.
| 2:31 am on Mar 15, 2005 (gmt 0)|
"I think one should consider the effect on shareholder value before even the SERPs."
If you do this, you'd have to include all the factors related to this question, not just the ones that you prefer to think are related ;-)
As caveman noted, too many issues for me to understand. Doesn't mean they don't exist though.
Bottom line is this: it doesn't really matter what the sandbox is, as caveman notes again, the real question is how to avoid it. Clearly alegra released some sites/pages that were being 'algoed'. Time will tell what the current status of that is.
What I'm curious about this is this: Anybody still have an 'algoed' site? Post alegra that is?
| 7:40 am on Mar 15, 2005 (gmt 0)|
|This practice seems to have been replaced by the sandbox (or I should say whatever we call the sudden disappearance of sites competing for high revenue keywords in the SERPs) |
But in my experience it is not just sites competing for high revenue keywords. I have examples of non-profit sites that are well and trully SB'd. I did optimise these but only for keywords that by no stretch of the imagination could be considered commercial.
|What I'm curious about this is this: Anybody still have an 'algoed' site? Post alegra that is? |
I have several and I am sure that I am not the only one. I have been on holiday for the last two weeks and a bit out of touch. Was there any real evidence of a mass release during Allegra?
| 9:20 am on Mar 15, 2005 (gmt 0)|
When were they launched?
| 11:23 am on Mar 15, 2005 (gmt 0)|
|kw1 kw2 kw3 versus kw1 kw4 kw3, both containing common keywords, one was 'sandboxed', or 'algoed' if you prefer that term, the other was not. |
Thanks 2by4. An excellent point, I can't say I'd noticed word order affecting the 'lag' before.
| 11:33 am on Mar 15, 2005 (gmt 0)|
there is no way getting out of the sandbox except if you buy 2-5 years old domains with a good pr already indexed in google and start from the scratch, it is so easy so why all that problem , I can understand webmasters that have no money but for companies (I guess mom pop companies) what's the big deal find a friend or make an advertisement looking to buy old site, if you are lucky and find one that's it, with the usual tricks, within weeks you will be where you want to be that's how google works. Conclusion don't buy new domains, only if you want to wait the next 10 years.
| 11:35 am on Mar 15, 2005 (gmt 0)|
It does make sense that Google holds a list of top performing keywords and phrases. With people using the conversion tools in AdWords, Google now have a lot of conversion data. Those keywords which convert highly into enquiries and/or sales may be subject to the sandbox.
| 3:53 pm on Mar 15, 2005 (gmt 0)|
Your comments are appreciated. As I mentioned above, I’m not suggesting that lots of other things aren’t happening concurrently. Google is fighting spam obviously, they are trying to find mathematical solutions to complex issues which can at times produce odd results given the infinate number of variables, but I think that excluding consideration about the purely financial interests involved leaves any analysis woefully incomplete.
I've even seen some here call it a consipiracy theory. Well, a powerful company with huge market share and no transperancy pulling a few unfair moves to make billions? I'd hardly put that in the same category as a UFO abduction.
Google is no longer a project run by purists, it's a major corporation whose primary responsibility is to make as much money as they can. Becuase Google's bottom line is clearly more closely related to SERPS than MSN or Yahoo, Google has an even stronger incentive to manipulate them to keep revenues coming in and stay competative.
| 5:22 pm on Mar 15, 2005 (gmt 0)|
|Google has an even stronger incentive to manipulate them to keep revenues coming in and stay competative. |
How true! Their future corporate policy will be to keep providing Joe Public with gimmicky freebies like Picasa and the search bar, etc. This will probably be enough to retain their loyalty because in general "Joe" does not know or care about the quality of the search results.
| 8:19 pm on Mar 15, 2005 (gmt 0)|
ciml, in the above example:
kw1 900,000,000 results
kw2 150,000,000 results
kw3 30,000,000 results
kw4 200,000,000 results
Phrase 1 = kw1 + kw2 + kw3 -> algoed
Phrase 2 = kw1 + kw4 + kw3 -> not algoed
When the site became unalgoed in alegra, searches for phrase 1 are about 10 times more common than searches for phrase 2. Both phrases rank about the same. Both phrases return about the same number of results.
If the site as a whole had been algoed as a whole, how could this be? Every phrase should have been equally depressed in the serps, but this was't the case. And many others have noted this as well. Other phrases were not algoed. But the competitive phrases were. When site became de-algoed, serp positions were almost exactly the same as before being algoed, and those positions are basically exactly the same as what was returned using the allin type tests during it's algoation.
I don't think google is running two algos, one pre march 04, one post march 04. That simply doesn't make any sense, it violates the most basic rule of computer programming: keep it simple, don't run dual systems, they are a pain to maintain, as anyone who's ever tried to run a bilingual website, or a dual ns 4 / other browser site can tell you. I always look for the simplest explanation. The google guys are good programmers, they aren't going to tie themselves into that type of web. My guess is a new site flag coupled with the hilltop filter, which does contain that type of keyword search phrase list. Flag on page + hilltop = sandbox is my guess.
I suspect beedee could tell us if it was agressive link campaigning that algoed his sites, that's a very simple thing to determine, if the site is algoed, and no aggressive link campaign happened, that doesn't seem like an adequate explanation.
As caveman notes, it's possible, with great care, and some luck, to not get flagged, but that's another matter I think.
Then as Rollo notes, there are some pleasant business benefits to this as well, ignoring those is pretending that google is not a business, they are a business, they are very good at what they do, usually.
But what really interests me is how BillyS sees this, assuming he really does do large scale db programming, I'd say he has a better idea of some of the fundamentals than most here, no?
| 9:02 pm on Mar 15, 2005 (gmt 0)|
Or.. Caveman is 100% right, which wouldn't surprise me, and google in fact is rolling out a new algo, running on 64 bit machines, with a 40 bit index. Given what I've read about how Google does their hardware upgrades, this would make total sense. The new algo running on the new machines could easily include the above components, while the old one just putts along until they replace those machines over time. And all sites launched since march 04 are being run on that new algo, exactly as caveman suggests.
| 10:07 pm on Mar 15, 2005 (gmt 0)|
|I do believe that there is a list related to 'sandboxing.' Though perhaps it is better thought of as a 'category.' The category is: Sites launched after March '04. These sites are subject to a different combination of algo/filters than sites launched prior to March '04. |
So, in your opinion, would this category also include a site that was launched in 1999, but after March '04 was relocated to a new IP (same hosting company, though) with a slightly new design and with a new navigation structure? (i.e. - pages that used to be 3 levels deep were made 2 levels deep)
This describes a client site of ours. The site went missing (sandboxed?) almost immediately after the relaunch and, for the most part, came back with Allegra.
| 11:04 pm on Mar 15, 2005 (gmt 0)|
Pleeker, my opinion is that the short answer to your question is yes, especially if the site you refer to was SEO'd relatively conservatively. IP had nothing to do with it, but redesign and nav changes might. IMO, a number of previously sandboxed sites were on the cusp of being let in, and with Allegra, G loosened up certain filters that allowed this to happen.
One of the reasons I fight against the notion that there is some distinct "thing" called the sandbox is that it lets people off the hook. In your example, my guess is that your client's redesign made them susceptible to the part of the algo/filter combo that causes sites to vanish (there is no sandbox), and when G tweaked things with Allegra, some sites reappeared. Mainly more conservatively SEO'd sites, to my eye. :o
|It does make sense that Google holds a list of top performing keywords and phrases. |
Of course it does; the question is whether or not such a list is used to sandbox sites. Overwhelming evidence suggests that it's not.
|caveman is 100% right, which wouldn't surprise me |
Me neither. Happens all the time. ;-)
Oh lighten up, just kidding.
2by4, on tech-related matters of any significance, my opinions and theories are informed mainly by a whole lot of data from the many sites we run and competitors we watch, and by people who know technology far better than I, because I'm no techie. FWIW, I'm told that what I believe to be happening re sandboxing is at least technically feasible (meaning in practice, not just theory). Beyond that, well, I'm better at guessing what is happening than how.
For example, one might look at anchor text and kw patterns on and especially off site for clues about what is happening with sandboxing and how to get beyond it. Including overly aggressive footprints of various kinds. :-)
What amazes me is that there are still those who are either not listening or not paying enough attention to know what is possible and what is not. If one does not even know that, how can goals be established? :\
| 11:14 pm on Mar 15, 2005 (gmt 0)|
My sites were definitely not SB'd for aggressive link campaigning. They only have a few links each. I have one site with a domain name like blue-widgets.co.uk and it is nowhere for a blue widgets search. This site has several pages optimised for furry blue widgets, smooth blue widgets, dark blue widgets, etc. and only one of these pages has got through on a search for acronym blue widgets which is at number 11.
This has intrigued me. I have still to suss why it got through?
| 11:47 pm on Mar 15, 2005 (gmt 0)|
This is an interesting thread.
Yes, Google's index is a database. But it is not a general purpose relational database with a complex query language like Oracle. Instead, it's a highly optimized unipurpose database with a very simple query language. The search depth vs. speed trade-off has been part of its design from its very beginning including
- word based indexing as opposed to full text search using, say, burrows-wheeler algorithm,
- limited number of query results
- query caching and a
- simple query language.
Also, the sandbox hits young sites. If I was an engineer that had to come up with some kind of filter in order to limit search depth I would definitely not choose a site's age to be the predominant filter criterion. I would rather choose things like keyword density or keyword proximity.
| This 98 message thread spans 4 pages: < < 98 ( 1  3 4 ) > > |