Forum Moderators: open

Message Too Old, No Replies

Why does the 'Google Lag' exist?

Trying to understand its purpose.

         

bakedjake

1:43 am on Sep 29, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I had some in-depth discussion this weekend with some friends about the sandbox. Every theory on how to beat it kept coming back to one central problem - no one is sure why it exists.

I feel very strongly that until we have a good grasp on why it exists, it will be very hard to beat.

I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing. Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.

So, why does the sandbox exist?

The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?

DaveAtIFG

3:30 am on Oct 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm starting to think theme recognition was incorporated into the algo with Florida. It's very subtle. Liane suggested this to me last fall and I didn't see it at the time. I believe Google may be requiring inner pages to support the home page before a site gets prominent rankings.

The evidence is scarce and Google has been page oriented for as long as anyone can remember but...

As several of my sites gained links and prominence, they began to generate traffic on related keywords, but not my primary targeted keywords. The related keywords were usually targeted on an inner page. As the sites gained more prominence, they began to generate traffic on both the related and the primary keywords. It seems to take numerous spidering/indexing cycles for all of this to settle, a sandbox?

Although changes to an established page are spidered and indexed promptly, Google seems to take a month (and often longer) to reflect ranking changes, whether the changes are for the better or worse.

When searching, I routinely set preferences to display 100 results per page and, in my experience, indented results invariably support a site's theme. My experience is that post-Florida, changes to the page displayed as an indented result affect the main page's ranking.

I'm thinking Google added a "site theme" aspect to their algo with Florida. I believe it is a bolt on, after the fact, post spidering/indexing thing, that is generated and/or applied after several months of spidering/indexing. They're taking their sweet time identifying a theme and, until they do, no ranking prominence...

Google has been page oriented for so long that it's difficult to imagine them considering the totality of a site but I think that's what I'm seeing. It's as if they build a score from the bottom up, from the inner pages to the home page, THEN award an "on theme score." And they do this over months of spidering...

OK, like most of you, I'm theorizing about what the "Google Lag" is and not addressing Jake's original question, "Why does the 'Google Lag' exist?" The answer to that question is a simple one. It exists to thwart SEOs and their manipulation of Google's index. :)

BeeDeeDubbleU

7:55 am on Oct 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The answer to that question is a simple one. It exists to thwart SEOs and their manipulation of Google's index.

With respect, I think not. If I were in charge at the 'plex and I asked my people to come up with something to thwart SEOs and this was the result I would sack the person responsible.

Remember that "Google's mission is to organize the world's information and make it universally accessible and useful."

You don't do that by excluding all new sites from the results for a period of eight months or more. I still think that it may be a fault and it's existence should be publicised to force them into a comment. Doesn't anyone have the influence to get it into the press? Brett?

dirkz

9:32 am on Oct 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Sorry Boaz, but new sites do not invariably go into the sandbox.

You've said that more than one time on WebmasterWorld, so you probably know the difference between a site that'll get sandboxed vs. one that doesn't.

"How to avoid it" should lead to the "Why does it exist".

steveb

10:47 am on Oct 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



--"How to avoid it" should lead to the "Why does it exist".--

I don't think so. There isn't a good correlation there. If a way to avoid lag time was: get 1000+ links from different IP/unrelated domains... what would that tell us about why other sites are lagged? It would tell us something, but it wouldn't tell us why sites with 943 unrelated links are lagged; or why 1001 guestbook links would beat the lasg but 999 links from the very best domains in the galaxy wouldn't.

I can tell you how to beat a 7'4" whiteboy center to the hoop, but I can't tell you why the 7'4" whiteboy exists.

mfishy

12:22 pm on Oct 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry Boaz, but new sites do not invariably go into the sandbox.

Yes, not all new sites go into the sandbox.

People would do better to think of it as an algo, like Florida on steoroids, with tightened dupe filters also

I tend to agree except it is pretty obvious that age is a significant factor in this algo. I actually have pretty solid proof of this but am not at liberty to share the research here.

From what I see, and we done extensive research, very few exisitng sites were affected by the algo change which started in the early spring. The exception is, of course, huge datafeed sites. Also, there was a period in the spring where sites were popping out of the sandbox after a couple of months as though there was a holding period. So, it is quite interesting to see older sites with very similar attributes to newer sites rank on key terms while the newer sites seem to never really catch on.

If google is intending for this lag to exist, they really aren't helping their existing index in any way, as much of the same junk that was there in February is still there - it is a case of old vs. new junk I suppose :)

BeeDeeDubbleU

12:45 pm on Oct 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



it is a case of old vs. new junk I suppose

I don't know about this. I have created about six or eight sites since this started. All of them are for clients who offer services as opposed to selling on line. None of them sell anything through the sites or carry any adverts and all of them provide information about the services they provide. Not junk - but still not featuring.

randle

1:00 pm on Oct 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



BeeDeeDubbleU,

I could not agree with you more on this;

“You don't do that by excluding all new sites from the results for a period of eight months or more.”

It is astounding to me that from pre-IPO through post-IPO, this index, for all practical purposes, is not showing any new sites for coming up on a year now. A year, looked at in relation to the changes that go on in the internet, is an incredibly long time. I am an admitted Goolge fan but this fact is something that is a serious issue with their record.

I don’t know “why”, but every day it goes on I get closer to thinking it cannot be intentional, and they are struggling to fix it. Because eventually its going to get more play, and they have hung their hat on freshness, which this index is anything but.

webdude

1:12 pm on Oct 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I see quite a few theories being bantered about on the lag time thing. I personally haven't experienced it but, if we could pool data from sites that have and have not been affected by this, maybe we could nail down exactly what causes it. In order to do this, we would have to ask some pertinent questions. And please feel free to add questions because I got tired of trying to come up with them. I am sure there are many many questions whichcould be added. I am just getting the ball rolling.

But then again, this might just be another of my stupid ideas :-)

1. Have you had a site that has been sandboxed?

2. What do you think the symptoms are for the sandbox effect?

3. How many sites do you have which have been sandboxed?

4. In the last year, how many new sites have you developed?

5. What percentage of these sites were commercial/for profit?

6. What percentage of these sites sandboxed were commercial/for profit?

7. On average, at what rate did you add backlinks (per week) on sandboxed site(s)?

8. Are all your sites on the same IP block?

9. Are all your sites registered with the same registrar?

10. As far as you know, was the domain name of the sandboxed site new?

11. On average, how many hits do you get per month from googlebot on your sandboxed sites?

12. Does the sandboxed site use Adsense?

13. Is the sandboxed site listed in dmoz and yahoo?

14. Give ranking results for the following using keywords you think are unique to your sandboxed site:
a. allinurl:
b. allinanchor:
c. allintext:
d. allintitle:

15. Is home page of the sandboxed site cached by Google?

16. Number of results for link: on homepage of sandboxed site?

17. Was a development tool used to create the sandboxed site (ie. dreamweaver, frontpage, etc)?

18. What is the PR of home page of the sandboxed site?

19. Average PR of other pages of the sandboxed site?

20. How old is the sandboxed site?

21. Do you buy links for the sandboxed site?

22. Keyword/keyphrase density on home page of sandboxed site?

23. Keyword/keyphrase density on average for other pages of the sandboxed site?

24. How many new pages are added on average per week to the sandboxed site?

25. Do you use a database to generate pages on the sandboxed site?

26. Is your sandboxed site an affiliate site?

27. Do you post text from other sites (ie. newsfeeds, articles etc.) on your sandboxed site?

28. Currently,on average, how many pages are on your sandboxed site?

29. Have you had a site that was taken out of the sandbox?

30. If so, how many days was the site in the sandbox?

31. What is the average PR of backlinks for the sandboxed site?

32. Was the site that was sandboxed new?

dirkz

1:59 pm on Oct 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> If a way to avoid lag time was: get 1000+ links from different IP/unrelated domains... what would that tell us about why other sites are lagged?

This would certainly disprove that sandbox has something to do with age of site and was a filter instead, which leads to a different "why" altogether.

renee

3:08 pm on Oct 5, 2004 (gmt 0)

10+ Year Member



>>Don't want to take this off-topic again, but just to correct a statement above, supplemental listings can have pagerank. Usually they don't, but some do.

yes you are right. I did go back and found several of my supplemental pages with pagerank using the google toolbar. Does this make sense? Note that pagerank is a relative weight among pages and it makes sense only if the pagerank is calculated from a matrix of interconnected backlinks. so if supplemental pages have true pr, then they have to be included in googles pagerank calculation. why would google do this if the supplementals are accessed only if there are not enough results in the search against the main index. also supplemental pages never get updated. google must be smarter than this. and it would seem to be against the purpose of the supplemental index.

So what is the explanation? looks like when google transferred the page to the supplemental it transferred the page record lock-stock-and barrel. this includes whatever pr value was stored at the time. i'll monitor this and see if the pageranks of the supplemental pages get updated when google does a pr update.

i have no tangible evidence whatsoever, just pure logic!

This 354 message thread spans 36 pages: 354