Forum Moderators: open

Message Too Old, No Replies

Why does the 'Google Lag' exist?

Trying to understand its purpose.

         

bakedjake

1:43 am on Sep 29, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I had some in-depth discussion this weekend with some friends about the sandbox. Every theory on how to beat it kept coming back to one central problem - no one is sure why it exists.

I feel very strongly that until we have a good grasp on why it exists, it will be very hard to beat.

I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing. Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.

So, why does the sandbox exist?

The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?

isitreal

12:54 am on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<<< How will looking at an operating system change my perspective on a search engine?

I assume if you are the mod for the linux forums you have a certain amount of scepticism about anything MS says or has said about its Windows products? That's exactly how I would look at anything google says or has said about what it does or how it does it, or why.

Re the sandbox, lag, penalty etc, yes, that's what we're both talking about largely here. Why it exists, and all that. It's very odd behavior. Good also to see the term more precisely defined, it's not just commercial terms though, it's much wider range than that from what I see. Unless commercial just means x number of results returned? Hard to say. Is it a capacity problem, is it a ranking problem, is ranking being used to deal with a capacity problem, is a capacity problem causing a big glitch in ranking, which is being called a 'lag'. Hard to say. But not hard to call it a problem.

Imagine this: MS releases their new longhorn. But you can't install any new software on it until the software is 6 months old. That's to thwart potential security holes, or whatever. Paint this picture for any other tech company than google and you can see how absurd the business model is. Google still is getting a free pass though.

Nice to see that at least a few here have been able to hack this latest version, though I'm not positive that all they did was prelink the domain or something.

SlyOldDog

1:18 am on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



[webmasterworld.com...]

Here is where the capacity theory first appeared. Remember the guy joined WebmasterWorld specifically to post the message.

His 1 year timeline matches the lag we see now.

GoogleGuy strenuously denied his comments.

Looks like a leak to me.

BillyS

1:24 am on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google still is getting a free pass though.

Free pass? Just because they might have a policy on holding back new websites?

I started a new site at the end of May 2004. At least Google has all pages in its index and spiders it daily - that gives me some comfort. Yahoo is still showing pages that are now nearly three months old - I know that because I changed the page structure in late July. And Yahoo has about 1/8th of the website in its database even though it Slurps down pages daily.

I've got exactly 1 page in MSN - and I don't care where they get their database from. Ask has my home page as does Wisenut. So tell me how Google is getting a free pass? They are STILL the best engine out there.

graywolf

1:29 am on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If there is no update soon, and this this thread continues much longer, we may actually fill up google's page capacity despite what everybody says ;-)

rfgdxm1

1:53 am on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>Looks like a leak to me.

If that wasn't a leak it was a very odd hoax. What is so odd about that post is the specificity of the details. While he says at the end this is just a guess, his post includes things like "They now considering reconstruction of the data tables which involves expanding ID fields to 5 bytes." That isn't consistent with a guess; he could only know that if he had an inside source. And I'd think a hoaxer would make the problem seem more urgent, rather than the problem will take a while to become evident. Very curious that if this was a hoax, his theory is consistent with what we are seeing now. Google had to do something temporarily about this problem, and that was create the sandbox. The new URLs they decided not to index were mostly new sites.

bakedjake

2:05 am on Oct 6, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



His 1 year timeline matches the lag we see now.

Huh. It sure does.

Marcia

2:05 am on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If they removed even part of the worthless cranked out duplicate swill, the index would probably be no more than 3/4 the size it is now.

bakedjake

2:13 am on Oct 6, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Well, I expect dupe filter 2.0 any day now Marcia. We're already seeing parts of it with slow death and this weekend's past update.

But I still don't believe the capacity issues, guys. I just think Google is smart enough to see something like that coming.

But maybe I am giving them too much credit.

rfgdxm1

2:24 am on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>If they removed even part of the worthless cranked out duplicate swill, the index would probably be no more than 3/4 the size it is now.

If Google could easily remove that worthless cranked out duplicate swill, no doubt they would even if they can index 20 times the number of pages that they currently do. And on the theory that they have hit a page indexing limit, Google may have decided to go slow upgrading once they realized that one of the consequences of going slow is that the sandbox would be that it would help keep down the amount of worthless cranked out duplicate swill. If the searcher is looking to buy a widget, there will still be lots of sites he can find selling widgets even if new sites are sandboxed. And for pure informational searches, how many useful pages are there out there on new sites where similar information can't also be found on old sites? Yeah, if I develop a cure for all forms of cancer and put that on a new site, it won't be findable in Google. However, is this a scenario that happens significantly often on the web? The sandbox probably is the most effective way of dealing with worthless cranked out duplicate swill. If Google is quickly indexing new sites, spammers will crank them out faster than Google can identify them and whack 'em.

The sandbox exists for some reason. If it isn't because of Google being limited to the number of pages they can index, then that means Google intentionally for some other reason decided to limit the size of the index. There would only be 2 reasons to do this when it wasn't necessary. #1) To fight spam; and/or #2) To give an incentive for new sites to buy Adwords.

rfgdxm1

2:30 am on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>But I still don't believe the capacity issues, guys. I just think Google is smart enough to see something like that coming.

>But maybe I am giving them too much credit.

Or maybe Google did in fact see it coming, and decided to go slow in upgrading because they considered keeping the size of the index down was in their best interest. Let's assume there really is a capacity issue, and Google has already hit it. Does anyone have data that Google's share of the market is declining ever since the sandbox effect hit? If not, then Google doesn't have a problem.

Hmm...

©2004 Google - Searching 4,285,199,774 web pages

Pretty close to the index limit size that poster claimed.

This 354 message thread spans 36 pages: 354