homepage Welcome to WebmasterWorld Guest from 54.163.72.86
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 354 message thread spans 12 pages: 354 ( [1] 2 3 4 5 6 7 8 9 ... 12 > >     
Why does the 'Google Lag' exist?
Trying to understand its purpose.
bakedjake




msg:112198
 1:43 am on Sep 29, 2004 (gmt 0)

I had some in-depth discussion this weekend with some friends about the sandbox. Every theory on how to beat it kept coming back to one central problem - no one is sure why it exists.

I feel very strongly that until we have a good grasp on why it exists, it will be very hard to beat.

I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing. Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.

So, why does the sandbox exist?

The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?

 

Miop




msg:112199
 2:00 am on Sep 29, 2004 (gmt 0)

Are all new web sites affected or only commercial ones?

I started a couple of commercial sites (ecommerce) for people back in spring, and they hardly rank at all on Google, but also I bought a new domain last year and installed oscommerce but never developed it apart from an index page. It was spidered and got a pagerank of 2. Three months ago I added content but Google will still not cache or index any pages beyond the index page.

It's almost as if it has been put in the sandbox even though it is an 'established' site!

werty




msg:112200
 2:15 am on Sep 29, 2004 (gmt 0)

Well it seems like if they do not take new domain names, you have a new site and "you want to play", then you "got to pay".

Seems like $$$ is the answer.

minnapple




msg:112201
 2:17 am on Sep 29, 2004 (gmt 0)

Left side - pushes new sites into using AdWords
Right side - keeps numerous sites generated from common databases from flooding the serps.

Winners
Google stock holders, established web sites, in some cases consumers.

Losers
New web sites, in some cases consumers.

hooloovoo22




msg:112202
 2:35 am on Sep 29, 2004 (gmt 0)

If I was an SE I do not think the need to index new sites quickly would be my priority. I think it would be a smart move to watch the site off to the side and watch how it grows and integrates with the current index. At some point it would or would not be beneficial to add.

I'm sure there are plenty of 'what if' scenarios and I've experienced sites that were very similarly launched and one got in and one didn't.

It seems there is no rhyme or reason, but something like a sandbox in theory is a good idea...especially when google considers spam one of their main obstacles, strong enough to mention multiple times in their s1 filing and spam taxonomy paper.

renee




msg:112203
 3:04 am on Sep 29, 2004 (gmt 0)

this is my theory:

Google's main index cannot take in any more domains. So Google's solution is to create a new, separate index (similar to the supplemental index) for new domains/sites. Sure behaves like the supplemental. yields serps to site: queries as well as queries with small result set from the main index.

the only way google migrates domains/sites to the main index if google removes complete, old sites (my site is missing!) from the main index. the question then is what algorith google uses to remove old sites from the main index and selects new sites from the "sandbox" index.

from what i see, the algorithm appears to be nothing but pure random chance.

by the way, have you noticed that the total number of pages in the main index has not changed in more than a year?

controversial theory but sure explains all the symptoms we see.

graywolf




msg:112204
 3:26 am on Sep 29, 2004 (gmt 0)

Pure speculation nothing to back this up

1)Lets say google knew they were going to IPO this year so they needed a way to keep things stable and not mess that up. So they instituted the 'the sandbox' earlier this year to keep things stable during the IPO. Now that the IPO is over and 'googlebot is running in panic mode' [webmasterworld.com] they are rebuilding the index and possibly addressing some problems like page jacking [webmasterworld.com].

2)The sponsored text link business is hard to combat without collateral damage. So they now force links to go thru a probationary period first. It affects new sites the hardest since ALL of the links are going thru at the same time.

The only thing I can say with certainty is site wide links will push you into the sandbox if you're on the edge. I had one fairly new site that was ranking incredibly poorly but still ranking. A week or so after it got a site wide link (100+pages) it was banished. If I test the using the "allin" commands right now I'm #2-4.

The problem is we've only seen stuff move out once (early may). With only one instance to study, its fairly dificult to determine a pattern of behavior.

jnmconsulting




msg:112205
 3:43 am on Sep 29, 2004 (gmt 0)

IMO

Here is a thought! What if google stopped PR updates and some of the other strange things to see what the SEO community would do, specifically to target the spamming/linking/hijacking issues. You know as well as I do that those who are out looking for page rank in the most devious ways are scrambling right now. So they then send out new bots on the new IP address range that they registered in march to the top (N) problem sites, somewhat under the radar to see what what changes have been made to those sites. They then compare the old and the new index. That would give them a pretty good idea of the scope antics used. Then one by one start dropping them.

renee




msg:112206
 3:53 am on Sep 29, 2004 (gmt 0)

graywolf,

your speculation is no better than mine. do you have any proof of what you're saying?

graywolf




msg:112207
 4:04 am on Sep 29, 2004 (gmt 0)

Pure speculation nothing to back this up

Guess you missed that part ;-)

grant




msg:112208
 4:06 am on Sep 29, 2004 (gmt 0)

I think it exists because the barrier to entry on the Internet is SO low, that if sites could rank well right out of the gate, webmasters (particularly affiliate sites) would take a sawed off shotgun approach.

Those who are serious about developing a quality site with longevity will be willing to wait the sandbox effect out, others will not. Therefore, it asks as a filter.

That's my theory.

Rick_M




msg:112209
 4:20 am on Sep 29, 2004 (gmt 0)

My theory:

before the sandbox, it was easy to slap a focused anchor text link to a new page on a high PR page and you'd suddenly rank top 10 for the focused anchor text - the higher the PR of the link, the better your ranking, even for somewhat competitive terms. "Minty freshness" was too much of a good thing.

Not only does the sandbox combat people from spamming the SERPs, but it is an effective way to combat buying links. How would you feel paying a lot of money per month for a high page rank link, only to see that there were no results month after month - and you'd never know when, if at all, it would pay off.

Interesting comment above about a sitewide link pushing a site into the sandbox. I have a similar experience with a domain I set up for fun. I had linked to the site from all of my main sites pages with the anchor text: "Keyword1 Keyword2 Keyword3". The site ranked #1 for all three keywords, but around 4th for "keyword1 keyword2". No one would search for the 3 keywords together, so I changed the anchor text site wide to "keyword1 keyword2" - within 1 week, the site won't rank in the top 40 (this was around 6-8 months ago now). I change the anchor text back to 'keyword1 keyword2 keyword3' and the site still doesn't rank well for any of the keywords - and these are not competitive keyword combinations.

One other interesting experience - I had a few odd pages that were getting a lot of search traffic for a specific phrase. I went ahead and set up a page specifically for that phrase that then linked to the other pages that had been getting the traffic. I then linked to the new page from my site's home page. This was about 1 year ago, and now, not a single page on that site ranks for the keywords I was targetting. These were also not very competitive keywords. It seemed that if I overdid a page for a specific set of keywords, nothing on the site would rank for those keywords. Don't know if anyone else has seen anything like this.

Marcia




msg:112210
 4:21 am on Sep 29, 2004 (gmt 0)

had one fairly new site that was ranking incredibly poorly but still ranking. A week or so after it got a site wide link (100+pages) it was banished. If I test the using the "allin" commands right now I'm #2-4.

graywolf, I've got a similar story, except that the site was ranking nicely, instead of inbounds it involved internal anchor text and too few inbounds, and dates back to Florida. How does top 5 or 6 for allinanchor out of about 1500 sound with only about 4 or 5 IBL total?

Any idea what the percentage/proportion was with identical anchor text?

ogletree




msg:112211
 4:25 am on Sep 29, 2004 (gmt 0)

I have not seen a sandbox for internal links. The anchor text and PR pass very well and quickly.

Scarecrow




msg:112212
 4:32 am on Sep 29, 2004 (gmt 0)

So Google's solution is to create a new, separate index (similar to the supplemental index) for new domains/sites. Sure behaves like the supplemental. yields serps to site: queries as well as queries with small result set from the main index.

The main index is for established sites, as long as they don't suddenly generate lots of new pages.

The Supplemental index was started in August, 2003. It kicks in when Google runs out of results to show from the main index.

The URL-only index isn't really an index, but just a listing of URLs. Your search term has to hit on the domain, or directory, or filename somehow. The page itself isn't indexed, but the words in the URL must be indexed for fast access. That's the only sense in which it can be called an "index." A URL-only listing is a more accurate term, so as not to imply that the page itself is indexed.

Now, are you suggesting, renee, that the sandbox is yet another index?

If so, my guess is that they're shelving stuff temporarily because they anticipate rolling out a new main index with an expanded docID that has more than 32 bits, Real Soon Now. The Supplemental and URL-only listing is bad enough, but if they added yet another index to prop up their shaky operation, I cannot see how they can justify this unless the Big Fix is right around the corner.

Marcia




msg:112213
 5:16 am on Sep 29, 2004 (gmt 0)

>>Pure speculation nothing to back this up

Worse than that, this is pure fanciful imagination, so you'll have to be kind and indulge me.

Thinking about the patent applied for early last year and issued early this year, let's say they were to score all the sites, including new ones, and come up with Old Rank. Then, they take only the top so many out of those, let's say 100 or 500 or 1,000 and re-rank those, this time taking into consideration for scoring the linking to sites from other sites within that set of top-ranking pages. That's Local Rank, and from those two is calculated New Rank, which is the index we get to see.

It is unlikely that brand new sites will have enough links, if any at all, from others in that relevant set, so let's imagine when they fail the test they're put in deep-freeze for 3 months, at the end of which time it's calculated so that they can be assessed equally with older sites, having had time to acquire the necessary links to qualify for inclusion. Some will fail; hence there are some we hear of still not out of Google purgatory after 6 months of waiting.

if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.

Yes darlin' - exactly! In fact, I've seen one of the bad boyz post that he can have a brand spanking new domain ranking well within a few days of first registering it.

Theoretically, there are some who more than likely can generate multiple on-topic, high PR links for themselves intantaneously from among the necessary set of top-ranking sites for a keyword phrase, grouping or semantically correct family, simply because they've already got their own network of high PR, high-ranking sites in the top echelons of that space.

Even if this were so and not just conjecture, what would be the reason - except possibly to keep it down to the 20% who are capable of evading the sandbox.

caveman




msg:112214
 5:43 am on Sep 29, 2004 (gmt 0)

OK Jake, my 2 cents...

"When one has ruled out the possible, one must consider the impossible." -- Spock

I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing.

Eminantly logical. The problem with this line of reasoning (if it is a problem) is that if it is true, then we must rule out any course of action that G took freely, i.e., by choice.

That is, if we assume that they would not cause so much damage by choice, then sandboxing, whatever it is, must have been necessary. Or at best, a devil's choice.

If this line of thinking is correct, then theories like the one suggesting that they're run out of space gain credibility. And indeed, perhaps they're out buying storage right now with their IPO proceeds.

Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.

This argument I have heard, but have more trouble with. They will *never* stop all the spammers, and they know it. But in fact it was getting such that any Tom Dick or Harriet could put up a site and game G. Then the gamers would come here and boast about how they put up a 10,000 page site last week and today they're number one in the SERP's for hundreds of KW's. I've said before, I cringed every time I read one of those posts. Webmasters ganging up to collectively shoot themselves in the feet. Brilliant.

With that in mind, is it so surprising that G would at least take out some of the spammers that were making them look ... um ... *really* bad?

So, why does the sandbox exist? The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?

Given what we know about:
--the increase of auto generated sites/pages
--G's dissatisfaction with too many affiliates
--feeds making dup content more prevalent,
--etc, etc, etc,...
yes, it's easy for me to believe that killing new pages, increasingly being dumped into the Web by a growing swell of new, short-term oriented webmasters....was a short term goal to produce a short term shake out.

Does John Q. Public know or care? Not.

Even a bunch of smart people inside the same company can make a collectively bad decision. I think that is what is happening here.

Like a caveman knows anything. All I know really well is cave stuff. Ooops, gotta go; cavewoman is calling... :-)

jaina2




msg:112215
 7:07 am on Sep 29, 2004 (gmt 0)

I believe the sandbox is a side effect of topic sensitive PR. And the cure (atleast until May) was a listing in the appropriate category in DMOZ.
But this implementation of TSPR, which gave the impression that the new sites were not allowed to rank for competitive words for an undetermined period, has caught the fancy of G and they havenít updated the vectors which would pull a site out of the sandbox.

Powdork




msg:112216
 7:20 am on Sep 29, 2004 (gmt 0)

Q. Why would they do it?
A. Because then even someone who has sworn on a copy of "The Anatomy of a Search Engine" here on WW that he would not resort to Adwords would indeed, resort to Adwords.

Of course if Overture could have managed to start the program within their specified timeframe, I wouldn't have had to. OTOH, I'm really starting to like it.:)

Marcia




msg:112217
 7:26 am on Sep 29, 2004 (gmt 0)

Given what we know about:
--the increase of auto generated sites/pages
--G's dissatisfaction with too many affiliates
--feeds making dup content more prevalent,
--etc, etc, etc,...
yes, it's easy for me to believe that killing new pages, increasingly being dumped into the Web by a growing swell of new, short-term oriented webmasters....was a short term goal to produce a short term shake out.

It had to be, with the index getting filled with swill. I saw one site yesterday with a search box that generates and pumps into the index a replica of the pages in the search - mirrored on their site, with every single link on those pages generating additional pages. All part of their site, automatically generated with long URLs. Over 6K pages and growing, phony whois info.

I don't know that we're seeing TSPR, but it's got to have something to do with linking; otherwise, there wouldn't be sites that don't come out of it and sites that can get around it.

Bluepixel




msg:112218
 7:30 am on Sep 29, 2004 (gmt 0)

It's to stop people to create stupid websites just to earn money.

It's simple, the one's who don't create websites for money will still create websites, because they don't create their website to get google traffic. You, the spammers, won't. (I consider every SEO technique as spamming, you should leave your site as it is, and not do any work on it.)

trillianjedi




msg:112219
 11:11 am on Sep 29, 2004 (gmt 0)

So, start thinking like a search engine - what would be the benefit of this?

1. Spam

Well, you've ruled this one out from your POV Jake, but I'm not sure I agree with your reasoning that there's too much "collateral damage". There may be collateral damage, but there was during Florida also. I don't think the SE's get involved in the smaller picture, they look at the big picture which is Joe Average Surfer's user-experience.

Thinking like an SE, I'd live with a lot of collateral damage if it increased the quality of my brand at the expense of the loss of sites. I'm not citing Florida as an "increase in the quality of a brand" by the way. But I'm sure that's what it was meant to be.

2. Quality of Results

This is not the same as spam. If you take a hypothetical scenario where spam didn't exist, the ranking of sites is still very important to the user-experience and perceived quality.

3. IPO

If I were a SE in this situation, I would be playing ultra-safe until I'd sold all my shares. If that meant the freshness of my results were below par, I'd settle for it over a bunch of spam and low quality SERPS.

This would be for a limited period, then I'd commence what would probably become known as the "big post-IPO update". That's where I'd throw all my new technology and recommence a program of continued development rather than "lie low till the shares are sold".

I think the reason why is a combination of the three above. At least, I can't think of anything else.

TJ

Leosghost




msg:112220
 11:33 am on Sep 29, 2004 (gmt 0)

Reason =PFI,Adwords...what it definately isn't ...is lack of storage or processing ability to keep the index going or growing.
Also as greywolf speculated "enforced stablilty around the IPO"

Rosalind




msg:112221
 12:24 pm on Sep 29, 2004 (gmt 0)

I've always assumed that the sandbox was a symptom of spam rather than a deliberate algorithmic choice. Here's my theory:

Sites will not be put in the full index without sufficient quality inbounds. The index has grown, so perhaps the number of these inbound links required has grown. A lot of scraper sites provide inbound links, plentiful yet individually insignificant. It takes a while for these scraper sites to pick up on a new site. New pages will have the benefit of the rest of the site's anchor text, plus any scraper site anchor text linking in, whilst new domains will have neither until they get into the full index. That's your sandbox.

DaveN




msg:112222
 12:34 pm on Sep 29, 2004 (gmt 0)

Jake gut feeling alone here, But they are buying time, i think that pagerank is almost dead in the water, and they are crawling and re building the **NEW** NON PR Index in a new DC, if all Googles efforts are pushed into keeping up with Yahoo but without the problems that Y! have..

Then how much time would you spend on re-indexing a DB which will soon be obsolete, take florida jeeze then we had Update Rollback (sorry GG) but in eyes it was. this takes time and effort, where are aways around the sandbox and not just for the spammers we got a client site indexed in 72 hours and ranking and it's still ranking today which is well past the sandbox kill time.

What I have seen is that G is indexing new sites and is ranking them but if they don't fit their "let me in" criteria then the sites Die from the old index but every so often they run tests on the new DC with you site passes the new DB criteria then you live agian in the OLD google, problem for the masses is that they are not testing enough, if you can't find a hole and go buy a shovel and dig one!

DaveN

Rosalind




msg:112223
 12:51 pm on Sep 29, 2004 (gmt 0)

"Most webmasters don't have a commercial interest. They do it for fun and to help other people, not for money."
This is the one of the most infantile and silly posts I've ever seen.

Webmasters get into sites because it IS fun. And challenging. But the truth is, to develop a superior info site, one that, ultimately reaches many people and serves a ton of useful information------YOU HAVE TO BE SEEN IN SERPS. And that involves hard work aka seo. There's nothing wrong with it.

Bluepixel is right, most webmasters don't do it for money. The ones that you find at the top of the SERPS are a tiny fraction of the whole and do not represent typical webmasters at all. I imagine that those in charge of .edu and .gov domains will be less concerned with search engines, and many amateurs and hobbyists won't care either. Print, newletters, word of mouth, and radio are all ways to let people know about a website and they can work very well. At WebmasterWorld we tend to forget that you can get information at other places than the screen. I'm not just thinking of the blog that someone writes and tells all their friends about, but also the school website that gets mentioned in a newsletter, or the hobby site that is announced at a club meeting, for example.

chrisnrae




msg:112224
 1:26 pm on Sep 29, 2004 (gmt 0)

DaveN - very interesting theory. I think most have realized PR hasn't been "true" in a long time. You're post made me remember a line from the Boston pubcon where the Teoma rep (I believe, going from memory) responded to a comment asking if they would be coming out with their own pagerank by saying something along the lines of why would they want to duplicate an "old" piece of technology (again, totally from memory).

You're theory is the best I've seen to date. If I'm interepeting it correctly, G is making a totally new base to the algo - why spend time and effort "regulating" the old index in the meantime. But, they can't not do anything to the old index or it will turn into total crap.

So, just tweak the old index, prevent new domains from entering while still allowing new pages from established sites in and you've achieved an unnatural "stability" to the results. Webmasters are busy chasing PR and trying to figure out the ways around the sandbox - leaving them the time to focus on the new algo/index on a different datacenter - because they certainly don't want a repeat of Florida.

Nice food for thought over my morning soda. Thanks Dave ;).

mfishy




msg:112225
 1:45 pm on Sep 29, 2004 (gmt 0)

I am no longer convinced that the sandbox effect exists to prevent "spam" either Jake. If that were the case, essentially, they are simply providing users with old "spam" instead of new "spam".

Do those of you who believe spam prevention is the goal here honestly think webmasters have stopped creating sites because of the sandbox? Is this really a long term solution? Wouldn't an intelligent scoring algo solve all of these problems anyhow - or is this simply part of it all now? Most importantly, does anyone here honestly believe the SERPS are somehow better today than they were 6 months ago?

trillianjedi




msg:112226
 1:46 pm on Sep 29, 2004 (gmt 0)

Most importantly, does anyone here honestly believe the SERPS are somehow better today than they were 6 months ago?

Not for SERPS I watch, no.

But they are no worse, either.

If that were the case, essentially, they are simply providing users with old "spam" instead of new "spam".

Hmmm.... yes, that's a good point.

TJ

hooloovoo22




msg:112227
 2:02 pm on Sep 29, 2004 (gmt 0)

I like DaveN's theory, but have a couple questions.

1. why would they spend time completely redesigning their algo and recreating the index when they already have arguably the best search engine already? That's a HUGE risk. True, you should never rest on your laurels...but it wasn't broken and fleshing out mail, news, portal, browser, etc would seem to expand their search empire more effectively.

2. it's not as easy to become a mainstream search engine anymore, somebody with a nifty little technology isn't going to slip by the big guys so easily like in the past. If they see something that does it better and is gaining market share, pull a MS and buy it?

This 354 message thread spans 12 pages: 354 ( [1] 2 3 4 5 6 7 8 9 ... 12 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved