Forum Moderators: open

Message Too Old, No Replies

Why does the 'Google Lag' exist?

Trying to understand its purpose.

         

bakedjake

1:43 am on Sep 29, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I had some in-depth discussion this weekend with some friends about the sandbox. Every theory on how to beat it kept coming back to one central problem - no one is sure why it exists.

I feel very strongly that until we have a good grasp on why it exists, it will be very hard to beat.

I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing. Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.

So, why does the sandbox exist?

The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?

grant

4:06 am on Sep 29, 2004 (gmt 0)

10+ Year Member



I think it exists because the barrier to entry on the Internet is SO low, that if sites could rank well right out of the gate, webmasters (particularly affiliate sites) would take a sawed off shotgun approach.

Those who are serious about developing a quality site with longevity will be willing to wait the sandbox effect out, others will not. Therefore, it asks as a filter.

That's my theory.

Rick_M

4:20 am on Sep 29, 2004 (gmt 0)

10+ Year Member



My theory:

before the sandbox, it was easy to slap a focused anchor text link to a new page on a high PR page and you'd suddenly rank top 10 for the focused anchor text - the higher the PR of the link, the better your ranking, even for somewhat competitive terms. "Minty freshness" was too much of a good thing.

Not only does the sandbox combat people from spamming the SERPs, but it is an effective way to combat buying links. How would you feel paying a lot of money per month for a high page rank link, only to see that there were no results month after month - and you'd never know when, if at all, it would pay off.

Interesting comment above about a sitewide link pushing a site into the sandbox. I have a similar experience with a domain I set up for fun. I had linked to the site from all of my main sites pages with the anchor text: "Keyword1 Keyword2 Keyword3". The site ranked #1 for all three keywords, but around 4th for "keyword1 keyword2". No one would search for the 3 keywords together, so I changed the anchor text site wide to "keyword1 keyword2" - within 1 week, the site won't rank in the top 40 (this was around 6-8 months ago now). I change the anchor text back to 'keyword1 keyword2 keyword3' and the site still doesn't rank well for any of the keywords - and these are not competitive keyword combinations.

One other interesting experience - I had a few odd pages that were getting a lot of search traffic for a specific phrase. I went ahead and set up a page specifically for that phrase that then linked to the other pages that had been getting the traffic. I then linked to the new page from my site's home page. This was about 1 year ago, and now, not a single page on that site ranks for the keywords I was targetting. These were also not very competitive keywords. It seemed that if I overdid a page for a specific set of keywords, nothing on the site would rank for those keywords. Don't know if anyone else has seen anything like this.

Marcia

4:21 am on Sep 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



had one fairly new site that was ranking incredibly poorly but still ranking. A week or so after it got a site wide link (100+pages) it was banished. If I test the using the "allin" commands right now I'm #2-4.

graywolf, I've got a similar story, except that the site was ranking nicely, instead of inbounds it involved internal anchor text and too few inbounds, and dates back to Florida. How does top 5 or 6 for allinanchor out of about 1500 sound with only about 4 or 5 IBL total?

Any idea what the percentage/proportion was with identical anchor text?

ogletree

4:25 am on Sep 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have not seen a sandbox for internal links. The anchor text and PR pass very well and quickly.

Scarecrow

4:32 am on Sep 29, 2004 (gmt 0)

10+ Year Member



So Google's solution is to create a new, separate index (similar to the supplemental index) for new domains/sites. Sure behaves like the supplemental. yields serps to site: queries as well as queries with small result set from the main index.

The main index is for established sites, as long as they don't suddenly generate lots of new pages.

The Supplemental index was started in August, 2003. It kicks in when Google runs out of results to show from the main index.

The URL-only index isn't really an index, but just a listing of URLs. Your search term has to hit on the domain, or directory, or filename somehow. The page itself isn't indexed, but the words in the URL must be indexed for fast access. That's the only sense in which it can be called an "index." A URL-only listing is a more accurate term, so as not to imply that the page itself is indexed.

Now, are you suggesting, renee, that the sandbox is yet another index?

If so, my guess is that they're shelving stuff temporarily because they anticipate rolling out a new main index with an expanded docID that has more than 32 bits, Real Soon Now. The Supplemental and URL-only listing is bad enough, but if they added yet another index to prop up their shaky operation, I cannot see how they can justify this unless the Big Fix is right around the corner.

Marcia

5:16 am on Sep 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>Pure speculation nothing to back this up

Worse than that, this is pure fanciful imagination, so you'll have to be kind and indulge me.

Thinking about the patent applied for early last year and issued early this year, let's say they were to score all the sites, including new ones, and come up with Old Rank. Then, they take only the top so many out of those, let's say 100 or 500 or 1,000 and re-rank those, this time taking into consideration for scoring the linking to sites from other sites within that set of top-ranking pages. That's Local Rank, and from those two is calculated New Rank, which is the index we get to see.

It is unlikely that brand new sites will have enough links, if any at all, from others in that relevant set, so let's imagine when they fail the test they're put in deep-freeze for 3 months, at the end of which time it's calculated so that they can be assessed equally with older sites, having had time to acquire the necessary links to qualify for inclusion. Some will fail; hence there are some we hear of still not out of Google purgatory after 6 months of waiting.

if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.

Yes darlin' - exactly! In fact, I've seen one of the bad boyz post that he can have a brand spanking new domain ranking well within a few days of first registering it.

Theoretically, there are some who more than likely can generate multiple on-topic, high PR links for themselves intantaneously from among the necessary set of top-ranking sites for a keyword phrase, grouping or semantically correct family, simply because they've already got their own network of high PR, high-ranking sites in the top echelons of that space.

Even if this were so and not just conjecture, what would be the reason - except possibly to keep it down to the 20% who are capable of evading the sandbox.

caveman

5:43 am on Sep 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK Jake, my 2 cents...

"When one has ruled out the possible, one must consider the impossible." -- Spock

I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing.

Eminantly logical. The problem with this line of reasoning (if it is a problem) is that if it is true, then we must rule out any course of action that G took freely, i.e., by choice.

That is, if we assume that they would not cause so much damage by choice, then sandboxing, whatever it is, must have been necessary. Or at best, a devil's choice.

If this line of thinking is correct, then theories like the one suggesting that they're run out of space gain credibility. And indeed, perhaps they're out buying storage right now with their IPO proceeds.

Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.

This argument I have heard, but have more trouble with. They will *never* stop all the spammers, and they know it. But in fact it was getting such that any Tom Dick or Harriet could put up a site and game G. Then the gamers would come here and boast about how they put up a 10,000 page site last week and today they're number one in the SERP's for hundreds of KW's. I've said before, I cringed every time I read one of those posts. Webmasters ganging up to collectively shoot themselves in the feet. Brilliant.

With that in mind, is it so surprising that G would at least take out some of the spammers that were making them look ... um ... *really* bad?

So, why does the sandbox exist? The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?

Given what we know about:
--the increase of auto generated sites/pages
--G's dissatisfaction with too many affiliates
--feeds making dup content more prevalent,
--etc, etc, etc,...
yes, it's easy for me to believe that killing new pages, increasingly being dumped into the Web by a growing swell of new, short-term oriented webmasters....was a short term goal to produce a short term shake out.

Does John Q. Public know or care? Not.

Even a bunch of smart people inside the same company can make a collectively bad decision. I think that is what is happening here.

Like a caveman knows anything. All I know really well is cave stuff. Ooops, gotta go; cavewoman is calling... :-)

jaina2

7:07 am on Sep 29, 2004 (gmt 0)

10+ Year Member



I believe the sandbox is a side effect of topic sensitive PR. And the cure (atleast until May) was a listing in the appropriate category in DMOZ.
But this implementation of TSPR, which gave the impression that the new sites were not allowed to rank for competitive words for an undetermined period, has caught the fancy of G and they haven’t updated the vectors which would pull a site out of the sandbox.

Powdork

7:20 am on Sep 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Q. Why would they do it?
A. Because then even someone who has sworn on a copy of "The Anatomy of a Search Engine" here on WW that he would not resort to Adwords would indeed, resort to Adwords.

Of course if Overture could have managed to start the program within their specified timeframe, I wouldn't have had to. OTOH, I'm really starting to like it.:)

Marcia

7:26 am on Sep 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Given what we know about:
--the increase of auto generated sites/pages
--G's dissatisfaction with too many affiliates
--feeds making dup content more prevalent,
--etc, etc, etc,...
yes, it's easy for me to believe that killing new pages, increasingly being dumped into the Web by a growing swell of new, short-term oriented webmasters....was a short term goal to produce a short term shake out.

It had to be, with the index getting filled with swill. I saw one site yesterday with a search box that generates and pumps into the index a replica of the pages in the search - mirrored on their site, with every single link on those pages generating additional pages. All part of their site, automatically generated with long URLs. Over 6K pages and growing, phony whois info.

I don't know that we're seeing TSPR, but it's got to have something to do with linking; otherwise, there wouldn't be sites that don't come out of it and sites that can get around it.

This 354 message thread spans 36 pages: 354