Why does the 'Google Lag' exist?

Forum Moderators: open

Message Too Old, No Replies

Why does the 'Google Lag' exist?

Trying to understand its purpose.

bakedjake

1:43 am on Sep 29, 2004 (gmt 0)

I had some in-depth discussion this weekend with some friends about the sandbox. Every theory on how to beat it kept coming back to one central problem - no one is sure why it exists.

I feel very strongly that until we have a good grasp on why it exists, it will be very hard to beat.

I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing. Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.

So, why does the sandbox exist?

The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?

Scarecrow

2:38 am on Oct 6, 2004 (gmt 0)

I love GoogleGuy's reaction to the re5earcher "leak."

His first reaction, the same day (June 7, 2003), was "Did anyone catch the IP address of that masked re5earcher? ;) (just kidding)"

Another reaction, on June 16, was:

One cautionary word of advice: take everything with a grain of salt, and make choices that are common sense to you and work well for your users. For example, there was recently a thread that suggested Google was running out of "address space" to label our documents. I was talking to another engineer here and he said he almost fell out of his chair laughing when he read that. So there's always a lot of theories floating around all the time about why something is this way or that. My advice is to assume that Google wants the most useful, relevant pages to come up first for searchers. Try to build those useful, relevant pages as well as you can, and we'll do our best to find them and rank them accurately for searches.

I believe that Google got re5earcher's IP address and had a friendly chat with him. On June 14 re5earcher answered a sticky and said,

"hehe, tell them it was a hoax, nothing more :)
and i'm not a google employee :)"

But then, I'm not sure that it was really re5earcher behind the sticky at that point.

mfishy

2:53 am on Oct 6, 2004 (gmt 0)

<<His 1 year timeline matches the lag we see now>>

interesting...

rfgdxm1

2:53 am on Oct 6, 2004 (gmt 0)

>I believe that Google got re5earcher's IP address and had a friendly chat with him.

Or GoogleGuy knew this was a genuine leak, and wanted to plug it and stop even more from leaking out. The problem here is that if this was a genuine leak, GoogleGuy would be expected to say it was just BS. And if this was a hoax, GoogleGuy would also say this was just BS. So we can infer nothing by how GG responded.

I must say I do find it curious that what re5earcher in that post predicted seems to have come true. Of course, coincidences happen all the time.

bears5122

2:58 am on Oct 6, 2004 (gmt 0)

One year lag time? Why even bother using Google than. I'd prefer a search engine that provides the best results on the web, not the best results of a year ago.

graywolf

3:19 am on Oct 6, 2004 (gmt 0)

If you look at all of rea5earcher's posts the language/tone is really inconsistant. The one refrenced above is pretty coherent, and with no underscores.

post #1 here isn't, he misspells "algirithm" and uses lots of underscores.

[webmasterworld.com...]

msg #3 more underscores

skip down to message #14 no underscores in sight and he is back to being pretty coherent.

#18 the underscores return

Was re5earcher more than one person?

plumsauce

3:33 am on Oct 6, 2004 (gmt 0)

Hey, back in the 80's a lot of programmers thought they were being quite clever by using 2 digits to represent years to save some space. *someone* would have plenty of time to fix the problem before Y2K.

Now, apply this to the tools google uses by considering the stable released versions. They are 32 bit tools in native form.

The machines and perforce the os that google uses are 32 bit. Therefore, the tools such as compilers and script engines are also 32 bit.

Now, even *if* they use 64 bit routines, consider that every 64 bit access is actually *2* accesses to the data bus. Consider then, that the cpu idles for *multiple* wait states during data bus accesses on cache misses, you then have a massive slow down in doing calculations if you move to > 32 bits, both in data access times and longer code paths.

So, even if they have designed 64 bit workarounds, it remains a workaround. And, as long as they stay on 32 bit boxes, the backend calculation times cannot help but increase on any attempt to move beyond 32 bits. The pipe is only so wide.

The 32 bit limit is immutable in the hardware.

Is it causing a problem? From here it seems to be a reasonable presumption.

As part of their *heritage* google as a matter of image will not move away from their chosen operating system and duct taped white boxes.

The awaited MS search is not hobbled by such considerations. As a matter of fact, it becomes a showcase opportunity.

hmmm ....

nuevojefe

3:40 am on Oct 6, 2004 (gmt 0)

It's funny if you do a search for re5earcher in G that it got a decent bit of notice (his hoax(?) post).

rfgdxm1

3:50 am on Oct 6, 2004 (gmt 0)

>Was re5earcher more than one person?

The posts are similar enough they could plausibly be from the same author. Only one that seems somewhat inconsistent is the first one about Google running out data indexing capacity. My interpretation of this is one of the following:

#1) The first post was written by someone in Google.
#2) The first post was written by someone else who sent it to him, and this person claimed to have a source inside Google.

Note he apparently contradicts himself in the first post. At the end it says "[just a guess but who knows]". Notice the use of brackets there. This would only make sense if he was tacking that on himself to a communication he received from another. If these were all his words, no need to bracket that. Also, he contradicts that this is a just a guess by stating above:

"They now considering reconstruction of the data tables which involves expanding ID fields to 5 bytes."

How could he know the expansion would be to 5 bytes, and not say 6?

"This procedure will require 1000 new page index servers and additional storage for temporary tables.

"They are hoping to make this change gradually server by server.

"The completion of the process will take up to one year after that the main URL index will be switched to use 5 bytes ID."

How if he just guessed they were running out of data indexing capacity could he know that it would take specifically 1000 new page indexing servers, Google was hoping to make this change gradually, and that this would take up to one year to do? He couldn't. Thus the most reasonable explanation is that he wasn't guessing, but instead this was sent to him by someone else and he decided to run it by the people here for analysis.

cabbie

3:53 am on Oct 6, 2004 (gmt 0)

>>>While he says at the end this is just a guess, his post includes things like "They now considering reconstruction of the data tables which involves expanding ID fields to 5 bytes." That isn't consistent with a guess; he could only know that if he had an inside source. <<<

Originally he stated that the info was from an inside source and then lated edited it to "this is just a guess"

dazzlindonna

3:54 am on Oct 6, 2004 (gmt 0)

They are hoping to make this change gradually server by server.

Could explain all the datacenters that have disappeared over time...

This 354 message thread spans 36 pages: 354