homepage Welcome to WebmasterWorld Guest from 54.196.201.253
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 98 message thread spans 4 pages: < < 98 ( 1 2 [3] 4 > >     
Does the "sandbox" Only Affect Phrases Containing Popular Words?
If the phrase has no words over 70-80 million results, does sandbox apply?
ciml




msg:770272
 6:35 pm on Mar 10, 2005 (gmt 0)

While discussing [webmasterworld.com] a most interesting analysis of Google's number of results [aixtal.blogspot.com] figures I speculated that the Google might use a smaller index for popular words, in a manner similar to that explained in a pre-Google Backrub paper.

Liane took this idea further, and suggested that this might explain the sandbox.

So without getting into specifics, what is the view on sandbox applying to phrases that have no words with less than 80 million results?

Keep in mind that many phrases with few results contain at least one word with more than 80 million.

<added>
"that have no words with less than 80 million" should be "that have no words with more than 80 million". Thanks Liane for spotting the error.

[edited by: ciml at 3:33 pm (utc) on Mar. 12, 2005]

 

Hanu




msg:770332
 11:47 pm on Mar 15, 2005 (gmt 0)

This is an interesting thread.

BillyS,

Yes, Google's index is a database. But it is not a general purpose relational database with a complex query language like Oracle. Instead, it's a highly optimized unipurpose database with a very simple query language. The search depth vs. speed trade-off has been part of its design from its very beginning including

- word based indexing as opposed to full text search using, say, burrows-wheeler algorithm,
- limited number of query results
- query caching and a
- simple query language.

Also, the sandbox hits young sites. If I was an engineer that had to come up with some kind of filter in order to limit search depth I would definitely not choose a site's age to be the predominant filter criterion. I would rather choose things like keyword density or keyword proximity.

pleeker




msg:770333
 11:50 pm on Mar 15, 2005 (gmt 0)

Pleeker, my opinion is that the short answer to your question is yes, especially if the site you refer to was SEO'd relatively conservatively. IP had nothing to do with it, but redesign and nav changes might.

Now, if your opinion is correct ... why? The implication here is that after March '04, a site could be hurt (not penalized, hurt) simply by improving its design and navigation structure. That, to me, is a colossal failure on Google's part.

I don't know how to reply to the "especially" part because I suspect we all have different definitions of conservative SEO. :) The site in question is long on quality content, and short on IBLs. The redesign was done, frankly, with both users and SEs in mind -- we lightened page size substantially for faster loading, graphical buttons were replaced with text for minor added internal anchor text help, the 3rd-level pages were shortened to 2nd level, etc. Seems conservative to me, but I just don't get why changes like this would negatively impact a site that's been online since 1999. As I said, I consider it a failure on G's part.

BeeDeeDubbleU




msg:770334
 8:59 am on Mar 16, 2005 (gmt 0)

If I was an engineer that had to come up with some kind of filter in order to limit search depth I would definitely not choose a site's age to be the predominant filter criterion. I would rather choose things like keyword density or keyword proximity.

Nor would many of us here but you don't get to choose :)

As I said, I consider it a failure on G's part.

... and they don't care what you think :(

2by4




msg:770335
 9:29 am on Mar 16, 2005 (gmt 0)

" My sites were definitely not SB'd for aggressive link campaigning. They only have a few links each."

I suspected as much. And I seem to remember reading many other people saying the same thing.

Although I'm sure if you launch a new site, and engage in aggressive link building, you'll probably be sandboxed. And as beedee shows, if you launch a new site, and don't engage in aggressive link building, you'll probably be sandboxed.

Hope your sites are released soon beedee, it's fun when they come out. Funny thing about the sandbox idea, if you look at it like a sandbox, you can expect it to come out, and when it comes out, you're not surprised, since you knew it was in. If you make it more complex, then it gets confusing, hurts my head.

BeeDeeDubbleU




msg:770336
 12:37 pm on Mar 16, 2005 (gmt 0)

None of the sites I have done in the last year used aggressive linking campaigns. Likewise they did not target popular words. They were still SB'd. The only thing they have in common is that they were optimised using tried and trusted white hat techniques that served me well in the past. My older sites continue to do well.

I despair with this situation because I am no nearer to understanding it now than I was nine months ago :(

onebaldguy




msg:770337
 3:16 pm on Mar 16, 2005 (gmt 0)

Great posts caveman!

I do believe that there is a list related to 'sandboxing.' Though perhaps it is better thought of as a 'category.' The category is: Sites launched after March '04. These sites are subject to a different combination of algo/filters than sites launched prior to March '04. Being in this category

I tend to disagree because why would G not apply this new algo to sites that were published after Mar 04? Why was the algo not applied retroactively? G knows when sites are first indexed. If they wanted to apply the algo to sites less than 6-9 months old, they could have easily done this (which would have affected sites from as far back as June/July 03). It is possible that the reason they did not do this is because they added something new to the algo which analyzed certain data that was not available to them before Mar 04 (data they were not indexing).

What if they implemented something in the algo that takes into consideration time as related to links (temporal link analysis or something similar). Maybe they started indexing more data (date of link discovery), which they could then apply to the algo in a variety of ways. If the algo takes into consideration the link acquisition rate, they could easily filter out sites which had an unnatural link acquisition rate.

This is just a theory and maybe they were already indexing date of link discovery, but there may be some other data they were not indexing before Mar 04.

Powdork




msg:770338
 3:31 pm on Mar 16, 2005 (gmt 0)

Something I have been considering for a couple weeks is traffic. What if inbound external links were sandboxed until the page on which they reside is viewed x number of times or the link is clicked x number of times. That way Google doesn't have to care whether a link is reciprocated or paid while assigning its value. Links on the millions of link pages that noone visits would not count as votes. A true linkvote would be one where there is a reasonable chance someone may click it and see where it goes. After all, that was the original intention of PR.

onebaldguy




msg:770339
 3:42 pm on Mar 16, 2005 (gmt 0)

What if inbound external links were sandboxed until the page on which they reside is viewed x number of times or the link is clicked x number of times.

I have also wondered this for a while (but I would tend to lean towards the # of times the link is clicked on vs. # of times it is viewed]. This would help G determine which links are most relevant and 'useful' to the people. It takes a large step away from the 'random surfer' model which PageRank is based on.

So if you buy a link on a page and it is of no value to the users and no one clicks on it, then G will not value that link (or decrease it's weight). However, if there is a link on a page that gets clicked heavily, maybe more weight is given to it.

The G toolbar installed on millions of machines, so why not use some of that data. The more G (or any SE) can understand and utilize user behaviour, the more successful they will be at delivering what users want.

jcoronella




msg:770340
 4:20 pm on Mar 16, 2005 (gmt 0)


From what I've seen, groups of sandboxed sites tend to come out of it at once. This just me?

ciml




msg:770341
 4:28 pm on Mar 16, 2005 (gmt 0)

> tend to come out of it at once

That fits well with the posts here from time to time, when webmasters find that several sites of theirs have come out together.

jcoronella




msg:770342
 4:44 pm on Mar 16, 2005 (gmt 0)

Not sure where I was going with that, except that it has to do with some part of the algo that is updated in one batch.

I just tend to think that if google ranks pages on some combination of factors:
aX + bY + cZ = rank

One of those factors is likely PageRank, and one could be another factor that only some sites have. This factor is calculated once in a while (like pagerank seems to be). If you don't have it calculated, you can't compete with a site that does because that coefficient is not insignificant.

Now, who here knows what the calculated parameter is? ;)

Could it very well just be of type DateTime?

Could it be that sites that SEEM to get out of the box, just have higher coeficients in other parameters that overcome the lack of the mysterious sandbox variable?

It doesn't seem to me that it would be worth G's time to calculate a list of words that need sandboxing seeing as the very competitiveness of a SERP in their algo separates them out just fine.

Hanu




msg:770343
 5:00 pm on Mar 16, 2005 (gmt 0)

Hanu,

If I was an engineer that had to come up with some kind of filter in order to limit search depth I would definitely not choose a site's age to be the predominant filter criterion. I would rather choose things like keyword density or keyword proximity.

It should read 'If I were a Google engineer ...'.

You must have had too many Virgin Marys (or Maries?) last night ...

Powdork




msg:770344
 5:28 pm on Mar 16, 2005 (gmt 0)

It takes a large step away from the 'random surfer' model which PageRank is based on.
On the contrary, I think it brings the model back. Google reads all the links it can find but would only give value to those where there is a chance a random surfer will eventually click.
caveman




msg:770345
 5:39 pm on Mar 16, 2005 (gmt 0)

Could it be that sites that SEEM to get out of the box, just have higher coeficients in other parameters that overcome the lack of the mysterious sandbox variable?

Beautifully stated. In other words, I agree. :-) In the past, I've referred to this as the need to clear certain 'hurdles'. IMO, to be free of the sandbox, a site launched after March '04 must 'score' well in two broad areas:

#1 - ABSENSE OF NEGATIVES
Show minimal evidence of being "overly SEO'd" (working definition of "overly SEO'd": too many co-existing indicators of optimization, probably determined by crossing number of infractions with intensity of each infraction).

#2 - PRESENCE OF POSITIVES
Exhibit sufficient "indications of quality" to be excluded from sandboxing.

And #1 and #2 are not static.

My guess: Go far enough with the NEGATIVES and you may never get past the algo. But the more 'credibility' you establish via evidence of POSITIVES, the greater the likelihood that you can overcome the measured negatives.

(Sidenote: This IMO is why some blogspam has worked. The perceived positives <overwhelming number of IBL's> were sufficiently great as to overcome the measured negatives. Seems that they've fine tuned that problem a bit however.)

And yes, it seems that sites "pop out" in bunches because of tweaks to the criteria in the algo and filters, as we easily seen with Allegra.

onebaldguy




msg:770346
 6:57 pm on Mar 16, 2005 (gmt 0)

On the contrary, I think it brings the model back. Google reads all the links it can find but would only give value to those where there is a chance a random surfer will eventually click.

I don't want the thread to drift off topic, but that would not seem to be the case. As I understand it, the random surfer model means each link has an equal chance of being clicked on - a visitor randomly surfs the web. If you only counted links which were actually clicked or gave more weight to certain links, it gets away from a random surfer model (unless I am missing the concept, please educate me if so).

Powdork




msg:770347
 7:13 pm on Mar 16, 2005 (gmt 0)

What I mean is that with the original model and today's internet, Google would count numerous links from generated reciprocal (or three way) links pages that will never be clicked on by a human, random or otherwise. Why would they have to guess the likelihood of a random surfer visiting your page when they can BE THE RANDOM SURFER via the toolbar.

Trisha




msg:770348
 9:05 pm on Mar 16, 2005 (gmt 0)

...except that it has to do with some part of the algo that is updated in one batch.

I just tend to think that if google ranks pages on some combination of factors:
aX + bY + cZ = rank

One of those factors is likely PageRank, and one could be another factor that only some sites have. This factor is calculated once in a while (like pagerank seems to be). If you don't have it calculated, you can't compete with a site that does because that coefficient is not insignificant.
...

Could it be that sites that SEEM to get out of the box, just have higher coeficients in other parameters that overcome the lack of the mysterious sandbox variable?

This is along the same lines as what I was thinking about with the sandbox too.

I would guess that whatever is being batch calculated is something related to the domain as a whole. Since whole domains were affected, but only if they were new.

Many speculate that LSI was involved in Allegra, mostly, as I understand it, in regards to the incoming links. Maybe they were calculating something LSI related with all the links coming into a domain, not just a per page thing.

Or looking at an entire domain for duplicate content between pages on that same domain.

Or something else entirely, but something that couldn't be calculated for new domains quickly. Maybe it is not so much that they were batched, just that it took a long time. Especially if it were something new they were looking at and had to first calculate it for all existing domains first. That could cause a delay in them even getting around to doing this with new domains.

Meanwhile, new domains got a score that was something close to a zero for one of the factors. That made it nearly impossible to rank for anything competitive. The only way around it, as jcoronella said, would be to score extremely high on one of the other factors. So if you know how to get around the sandbox, then that should give you an idea of what at least one of the others factors is and how strongly it plays into the algo as a whole.

BeeDeeDubbleU




msg:770349
 10:14 pm on Mar 16, 2005 (gmt 0)

Show minimal evidence of being "overly SEO'd" (working definition of "overly SEO'd": too many co-existing indicators of optimization, probably determined by crossing number of infractions with intensity of each infraction).

If this were true then would they not have applied the same criteria to existing sites? If this was an algo related decision I don't see the sense of applying it only to new sites. (Unless I am missing something.)

rocco




msg:770350
 10:01 am on Mar 17, 2005 (gmt 0)

the original question:

'Does the "sandbox" Only Affect Phrases Containing Popular Words?'

the answer:

no. i have a site with a fantasy name, several months old, 1000's of backlinks (maybe 1% mentioning the the site name in anchor text), yahoo & dmoz listing, but it is not listed in the top 100 when searching for the site name. the sites listed above my site either:
- have a mispelling which happens to be my page title
- link to my site (not with my site name in the anchor text or anywhere else on their page)
oddly, my site gets 1000's of google visits a day for totally different searches which are much more competitive.

conclusion:
so, it does affect not only popular words, but also fantasy ones.

BeeDeeDubbleU




msg:770351
 10:47 am on Mar 17, 2005 (gmt 0)

I agree Rocco. As I said in message 65, in my experience it is not related to popular words. It may be related to popular words AND some other factors but not just popular words.

I was leaning towards an over optimisation filter but if this were the case why wouldn't they also apply it to older sites? It does not make sense to start doing this from March 2004 and leave the older sites untouched.

arras




msg:770352
 11:03 am on Mar 17, 2005 (gmt 0)

commercial or not commercial terms ,new and old sites,
YES there is a sandbox for new sites, I made an experiment and I added about 200 pages on a site from 2002 last week!...today!all my targeted money KW's and the 200 pages are indexed and YES at the top 10
any comments?

BeeDeeDubbleU




msg:770353
 11:55 am on Mar 17, 2005 (gmt 0)

This was already known. The sandboxs seems to apply only to new domains.

pleeker




msg:770354
 10:44 pm on Mar 17, 2005 (gmt 0)

Sorry, but I have to disagree -- everyone keeps referring to "only new sites" and "only new domains", etc. But as I pointed out above, it also hit a site that was built in 1999 and only went through some minor site design and navigation changes last summer.

BeeDeeDubbleU




msg:770355
 10:55 pm on Mar 17, 2005 (gmt 0)

How do you know this was the sandbox? It may have just been some other penalty caused by the changes you made. Minor changes can have major impact.

caveman




msg:770356
 12:45 am on Mar 18, 2005 (gmt 0)

What pleeker said.

tonygore




msg:770357
 2:29 am on Mar 18, 2005 (gmt 0)

Question - Are existing URLs affected by Sandbox (i.e. been indexed by Google before but now expired domain)?

**I tried posting this question earlier but it has been sent to admin for some reason**

shri




msg:770358
 3:15 am on Mar 18, 2005 (gmt 0)

If the domain has been expired, you have to deal with two sets of issues. First, make sure you email google to reinclude you. Then start worring about the googlelagenvector algorithm.

tonygore




msg:770359
 3:30 am on Mar 18, 2005 (gmt 0)

googlelagenvector algorithm?

shri




msg:770360
 3:44 am on Mar 18, 2005 (gmt 0)

Sorry, sandbox.

rocco




msg:770361
 9:16 am on Mar 18, 2005 (gmt 0)

BeeDeeDubbleU

*it* does not just affect new sites, ime. i have sites which i started in 2000/2001 which had only few backlinks and were not updated often. then i started to update them more frequently and aquired quite agressively backlinks (let's say the new-backlink ratio increased dramatically).

ime that "sandbox" does neither only affect pop words nor does it only affect newer sites. it affected all my sites with agressive link campaigns.

BeeDeeDubbleU




msg:770362
 9:55 am on Mar 18, 2005 (gmt 0)

With respect this is only your personal experience and thank you for sharing it. But, as I have said earlier, I create small niche, sites for consultants and small businesses that neither employ aggressive linking campaigns nor target popular keywords. I think all of those I have created since March last year have been sandboxed and none, as yet, released. One of these has just passed the 12 month marker and it is still firmly in the mire.

It may be that aggressive linking campaigns and targeting popular keywords can trigger some sort of penalty but this is not the sandbox that I am seeing. It may also be that older sites are now being penalised for aggressive linking campaigns but once again I don't think that this is the sandbox. The sandbox may not have been scientifically defined but I believe that it was accepted as something that happened to new sites. What happens to existing sites is different.

It's too easy to blame everything on the sandbox but this just confuses the issue. If there is a new penalty for aggressive linking this should be treated as a different issue. Perhaps someone more capable than me could come up with a surefire sandbox test that can be applied to sites old and new?

How many others have saw their pre-sandbox sites suffer as a result of linking campaigns?

This 98 message thread spans 4 pages: < < 98 ( 1 2 [3] 4 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved