|Why does the 'Google Lag' exist?|
Trying to understand its purpose.
I had some in-depth discussion this weekend with some friends about the sandbox. Every theory on how to beat it kept coming back to one central problem - no one is sure why it exists.
I feel very strongly that until we have a good grasp on why it exists, it will be very hard to beat.
I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing. Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.
So, why does the sandbox exist?
The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?
>>1. How come I get a new site A to rank above old site B for some searches, but it's the other way round for other searches?
there are 2 posiibilities:
- if new site A is truly in the sandbox index, then the only way both and new sites appear in the same serps is that the query is non-competitive. i have seen queries yielding 25000+ serps which is a mixture of supplemental and non-supplemental pages. when this happens, the ordering/ranking of the serps is not pr based (it can't be since supplementals do not have pr!). in the example above with 25000+ results, the number one spot was for a supplemental page. this would explain what you see if the sandbox is a separate index like the supplemental.
- the other possibility is that the new site is not in the sandbox index. if it has pr and show up in backlinks, then it is not in the sandbox.
>>2. Why do new sites appear at the top of serps for the allin commands?
again it depends on how many results you get with the query. some of my allin searches show supplemental results, which is an indication that google is not using the main index solely for the specific serps. i keep using the supplemental index because we know for sure that the page is not in the main index. i'm using the behavious of supplemental pages (i.e. separate index) as a model for the behaviour of the sandbox index.
>>3. Why did my PR get updated in April for a sandboxed site?
if it has pr, it is not in the sandbox. the problems you have not ranking in the serps are due to other reasons - penalties, filters, you are out-seoed, etc.
>>4. Why do sites in the sandbox index appear in the link:www.oldsite.com from the main index
if a page appears as a backlink, then it is not in the sandbox. as in #3, look for other reasons
<<if it has pr, it is not in the sandbox.>>
You are simply wrong.
Actually, your entire theory is based on a premise of a supplemental index that you have 0 proof exists. Also, if you understood how people are getting around the sandbox, you would understand the phenomena a lot more.
<<if it has pr, it is not in the sandbox. the problems you have not ranking in the serps are due to other reasons - penalties, filters, you are out-seoed, etc. >>
No. Folks who runa lot of sites ahve the advantage of knowing the exact formula each one has used. New sites with PR are not tranking where older site with very similar techniques are. Out of curioity, renee, do you have any sites that rank on competetive terms? You seem a bit out of the loop with your research here?!
[edited by: mfishy at 12:01 am (utc) on Oct. 5, 2004]
let me summarize a few points -
- a sandboxed page (or in the sandbox index) does not have pr and does not appear in backlinks;
- reverse is not necessarily true: a page with no pr or does not appear in backlinks is not necessarily a sandboxed page;
- a page with pr or appears in backlinks indicates that a page is definitely not in the sandbox.
google does not explicitly indicate if a page is in the sandbox, similar to the supplemental tag. so the only way we can decide is to use the above rules. it seems like the only conclusively deduce is that a page is not in the sandbox and therefore we can apply seo techniques to achieve rankings.
if the reason for the sandbox is truly out-of-capacity problems then maybe there's nothing we can do is wait until google solves this out-of-capacity problem. or as bdw says, let's go to press and put the pressure on google to accelerate their solution.
>>Actually, your entire theory is based on a premise of a supplemental index that you have 0 proof exists.
No proof is needed. Google has acknowledged the supplemental index. where have you been?
errr...but it has nothing to do with this
<<a sandboxed page (or in the sandbox index) does not have pr and does not appear in backlinks; >>
Let me summarize - you are wrong.
<<if it has pr, it is not in the sandbox.>>
I second mfishy, this statement is plain wrong.
IMO quite probably the Google sandbox/lag is an initially unintended side effect of something else that Google implemented. Whether it is unwanted by Google is another question though...
>>Out of curioity, renee, do you have any sites that rank on competetive terms? You seem a bit out of the loop with your research here?!
i run more than 50 sites and most of them rank in the top 5 of the appropriate serps. and i make very significant revenue (4 digits a month) from adsense and websearch. how about you?
>>i run more than 50 sites and most of them rank in the top 5 of the appropriate serps. and i make very significant revenue (4 digits a month) from adsense and websearch. how about you?
let me correct. i meant to say 5 digits.
>>let me correct. i meant to say 5 digits.
Good for you renee (I mean it, I am not being sarcastic). I am sure mfishy does very well as well. That in itself does not mean any of you is more right in this argument than the other. :)
Renee, IMO a better question would be - have you launched any/many new sites in the last 6 months? (new as in new domain name, not subdomains or new pages in existing sites).
>>Good for you renee (I mean it, I am not being sarcastic). I am sure mfishy does very well as well. That in itself does not mean any of you is more right in this argument than the other. :)
thank you Boa. you're so kind.
>>I am sure mfishy does very well as well. That in itself does not mean any of you is more right in this argument than the other. :)
mfishy asked for my qualifications since he seems to think i'm just an amateur. now why does mfishy need you to defend his qualifications?
>>Renee, IMO a better question would be - have you launched any/many new sites in the last 6 months? (new as in new domain name, not subdomains or new pages in existing sites).
yes i have launched new sites. and like most of you i am in agony over the snadbox issue. i have about 10 new sites waiting to rank.
I was not asking for your qualifications and certainly not your income, renee. It's just that you are seeing something very different than many, many webmasters and I have noticed in that in les competetive areas, things are workign a bit differently - that's all.
>> now why does mfishy need you to defend his qualifications?
He doesn't, it just looked like the discussion was going to get off the tracks and I wanted to put it back on track.
>> I have about 10 sites waiting to rank.
Now that's where we probably differ. If by "waiting to rank" you mean waiting to get PR, I have some sites I consider to still be in the sandbox, yet they already have (visible) PR. And I have a few other sites which I consider to be in the sandbox, with no visible PR yet (created after June 23, the last visible PR update), yet they already show backlinks. I also know from threads here that many other webmasters are in a similar situation, and not because they are bad SEOs (since their old sites, optimized similarly to their new ones and with similar links, do very well..).
This thread is being so intolerably hijacked that it's become almost impossible to read.
Supplemental Index = Reality
1. It appears in the search results labeled as such = tangible evidence.
2. It has been verified by Google as being a separate index = tangible evidence.
3. All those who have seen the tangible evidence agree that it exists.
Separate "Sandbox Index" = One Person's Theory
1. There is nothing appearing in the index or elsewhere as such = no evidence.
2. No such thing has been verified by Google or anyone else = no evidence.
3. No one has seen any tangible evidence, nor does anyone else agree that it exists, or have any reason to.
How about if the hijacking into debate about erroneous assumptions stops, and we have the courtesy to get back to the original poster's intention and topic:
Why does the Google Lag Exist?
That way we can continue the discussion without a mod or admin needing to close the whole thing down because it got hijacked WAY off topic. And also so that people can benefit instead of having empty arguments about something which not ONE person agrees exists.
Right, back to WHY theories
New Website owners tell Google EXACTLY the words they need to be ranked on to make money laying them out very clearly in Title tags H1 tags etc (whole industry devoted to this called SEO)
Google says thanks for that Info (like taking candy off a baby) so now we know, not only what you want, but what not to let you have .... unless of course you pay (Click that handy link to the right of the search results you are never ever going to get)
Yes I know we won your hearts and lots of you Webmaster put up links to us for free but things have changed and we have sorta got a monopoly on search and you need us a whole lot more than we need you.
Solution, don't tell Google what you want and everything will be OK.....
Still working on finer details of that work around ...
Why does the Google lag exist?
Just a theory, but suppose for some reason (too many new links, not enough time?) it now takes Google a looong time (as in months) to get around to a link to calculate accurately how much PR it passes?
So for a new link Google knows the link is there (hence it may show up in a new site's backlinks), it may guess how much PR it passes (hence the new site may eventually get PR), but since it is a guess (for a new link) the link has a mark by it that identifies it as a guess and so the link doesn't really count for competitive searches. Once Google gets around to check the link thoroughly and the PR it passes is no longer an initial guess, it starts counting the link as any other link and you may rank for more competitive searches.
New sites of course only have new links to them, hence they can be found only/mostly for non competitive searches.
So why do new internal pages rank (as in placement, not PR) well? Maybe in these cases Google considers its guess not to be a guess...
What corraboration do I have for this theory? Very little, except that I have seen a lag effect happen in the other direction as well - it took Google three months to realize that a site has lost some of its more important/influential links, and so the site continued for 3 months to rank extremely well in the SERPs when it shouldn't have any longer...
>>><<a sandboxed page (or in the sandbox index) does not have pr and does not appear in backlinks; >>
I hope my encouragement hasn't made you think that the theory is fact Renee.:)
As Mfishy and others have said PR and backlinks are no indication of a site being in or out of the sandbox.
Does this mean its back to the drawing board?
Google is full. They have reached their limit. Notice how fewer of your pages are being visited by the bot? Also notice how the current pages in the database has been constant for over a year now? They are currently reconfiguring the algo and when done the index will be effectively doubled. The way it is written this can be doubled, with effort, as needed.
Read some of Chris Sherman's (Clickz) comments on this subject.
Sorry Boaz, but new sites do not invariably go into the sandbox.
People would do better to think of it as an algo, like Florida on steoroids, with tightened dupe filters also. :-)
|People would do better to think of it as an algo, like Florida on steoroids, with tightened dupe filters also |
I've got a couple of sites that took a Florida hit that exhibit the exact same symptoms as sandboxed sites. No duplicates* involved, but I've believed all along that it's in the number and types of links that were and are still lacking for the sites.
Even aside from those, which is purely anecdotal, there seem to be some kind of connections and correlations between this phenomenon and the Florida phenomenon.
* While there were no duplications, there *was* a problem with the site, being done in Dreamweaver with a template run through, with the pages being properly dealt with after the very page tops, which were all identical, including the header graphic linked to the homepage with the same anchor text throughout.
I've seen this same phenomenon many times since then, with many sites. This is not a sandbox issue, but it is a very real issue with many templated sites or pages with headers or page-tops all identical.
Don't want to take this off-topic again, but just to correct a statement above, supplemental listings can have pagerank. Usually they don't, but some do.
I'm starting to think theme recognition was incorporated into the algo with Florida. It's very subtle. Liane suggested this to me last fall and I didn't see it at the time. I believe Google may be requiring inner pages to support the home page before a site gets prominent rankings.
The evidence is scarce and Google has been page oriented for as long as anyone can remember but...
As several of my sites gained links and prominence, they began to generate traffic on related keywords, but not my primary targeted keywords. The related keywords were usually targeted on an inner page. As the sites gained more prominence, they began to generate traffic on both the related and the primary keywords. It seems to take numerous spidering/indexing cycles for all of this to settle, a sandbox?
Although changes to an established page are spidered and indexed promptly, Google seems to take a month (and often longer) to reflect ranking changes, whether the changes are for the better or worse.
When searching, I routinely set preferences to display 100 results per page and, in my experience, indented results invariably support a site's theme. My experience is that post-Florida, changes to the page displayed as an indented result affect the main page's ranking.
I'm thinking Google added a "site theme" aspect to their algo with Florida. I believe it is a bolt on, after the fact, post spidering/indexing thing, that is generated and/or applied after several months of spidering/indexing. They're taking their sweet time identifying a theme and, until they do, no ranking prominence...
Google has been page oriented for so long that it's difficult to imagine them considering the totality of a site but I think that's what I'm seeing. It's as if they build a score from the bottom up, from the inner pages to the home page, THEN award an "on theme score." And they do this over months of spidering...
OK, like most of you, I'm theorizing about what the "Google Lag" is and not addressing Jake's original question, "Why does the 'Google Lag' exist?" The answer to that question is a simple one. It exists to thwart SEOs and their manipulation of Google's index. :)
|The answer to that question is a simple one. It exists to thwart SEOs and their manipulation of Google's index. |
With respect, I think not. If I were in charge at the 'plex and I asked my people to come up with something to thwart SEOs and this was the result I would sack the person responsible.
Remember that "Google's mission is to organize the world's information and make it universally accessible and useful."
You don't do that by excluding all new sites from the results for a period of eight months or more. I still think that it may be a fault and it's existence should be publicised to force them into a comment. Doesn't anyone have the influence to get it into the press? Brett?
> Sorry Boaz, but new sites do not invariably go into the sandbox.
You've said that more than one time on WebmasterWorld, so you probably know the difference between a site that'll get sandboxed vs. one that doesn't.
"How to avoid it" should lead to the "Why does it exist".
--"How to avoid it" should lead to the "Why does it exist".--
I don't think so. There isn't a good correlation there. If a way to avoid lag time was: get 1000+ links from different IP/unrelated domains... what would that tell us about why other sites are lagged? It would tell us something, but it wouldn't tell us why sites with 943 unrelated links are lagged; or why 1001 guestbook links would beat the lasg but 999 links from the very best domains in the galaxy wouldn't.
I can tell you how to beat a 7'4" whiteboy center to the hoop, but I can't tell you why the 7'4" whiteboy exists.
|Sorry Boaz, but new sites do not invariably go into the sandbox. |
Yes, not all new sites go into the sandbox.
|People would do better to think of it as an algo, like Florida on steoroids, with tightened dupe filters also |
I tend to agree except it is pretty obvious that age is a significant factor in this algo. I actually have pretty solid proof of this but am not at liberty to share the research here.
From what I see, and we done extensive research, very few exisitng sites were affected by the algo change which started in the early spring. The exception is, of course, huge datafeed sites. Also, there was a period in the spring where sites were popping out of the sandbox after a couple of months as though there was a holding period. So, it is quite interesting to see older sites with very similar attributes to newer sites rank on key terms while the newer sites seem to never really catch on.
If google is intending for this lag to exist, they really aren't helping their existing index in any way, as much of the same junk that was there in February is still there - it is a case of old vs. new junk I suppose :)
|it is a case of old vs. new junk I suppose |
I don't know about this. I have created about six or eight sites since this started. All of them are for clients who offer services as opposed to selling on line. None of them sell anything through the sites or carry any adverts and all of them provide information about the services they provide. Not junk - but still not featuring.
I could not agree with you more on this;
“You don't do that by excluding all new sites from the results for a period of eight months or more.”
It is astounding to me that from pre-IPO through post-IPO, this index, for all practical purposes, is not showing any new sites for coming up on a year now. A year, looked at in relation to the changes that go on in the internet, is an incredibly long time. I am an admitted Goolge fan but this fact is something that is a serious issue with their record.
I don’t know “why”, but every day it goes on I get closer to thinking it cannot be intentional, and they are struggling to fix it. Because eventually its going to get more play, and they have hung their hat on freshness, which this index is anything but.
I see quite a few theories being bantered about on the lag time thing. I personally haven't experienced it but, if we could pool data from sites that have and have not been affected by this, maybe we could nail down exactly what causes it. In order to do this, we would have to ask some pertinent questions. And please feel free to add questions because I got tired of trying to come up with them. I am sure there are many many questions whichcould be added. I am just getting the ball rolling.
But then again, this might just be another of my stupid ideas :-)
1. Have you had a site that has been sandboxed?
2. What do you think the symptoms are for the sandbox effect?
3. How many sites do you have which have been sandboxed?
4. In the last year, how many new sites have you developed?
5. What percentage of these sites were commercial/for profit?
6. What percentage of these sites sandboxed were commercial/for profit?
7. On average, at what rate did you add backlinks (per week) on sandboxed site(s)?
8. Are all your sites on the same IP block?
9. Are all your sites registered with the same registrar?
10. As far as you know, was the domain name of the sandboxed site new?
11. On average, how many hits do you get per month from googlebot on your sandboxed sites?
12. Does the sandboxed site use Adsense?
13. Is the sandboxed site listed in dmoz and yahoo?
14. Give ranking results for the following using keywords you think are unique to your sandboxed site:
15. Is home page of the sandboxed site cached by Google?
16. Number of results for link: on homepage of sandboxed site?
17. Was a development tool used to create the sandboxed site (ie. dreamweaver, frontpage, etc)?
18. What is the PR of home page of the sandboxed site?
19. Average PR of other pages of the sandboxed site?
20. How old is the sandboxed site?
21. Do you buy links for the sandboxed site?
22. Keyword/keyphrase density on home page of sandboxed site?
23. Keyword/keyphrase density on average for other pages of the sandboxed site?
24. How many new pages are added on average per week to the sandboxed site?
25. Do you use a database to generate pages on the sandboxed site?
26. Is your sandboxed site an affiliate site?
27. Do you post text from other sites (ie. newsfeeds, articles etc.) on your sandboxed site?
28. Currently,on average, how many pages are on your sandboxed site?
29. Have you had a site that was taken out of the sandbox?
30. If so, how many days was the site in the sandbox?
31. What is the average PR of backlinks for the sandboxed site?
32. Was the site that was sandboxed new?
> If a way to avoid lag time was: get 1000+ links from different IP/unrelated domains... what would that tell us about why other sites are lagged?
This would certainly disprove that sandbox has something to do with age of site and was a filter instead, which leads to a different "why" altogether.
>>Don't want to take this off-topic again, but just to correct a statement above, supplemental listings can have pagerank. Usually they don't, but some do.
yes you are right. I did go back and found several of my supplemental pages with pagerank using the google toolbar. Does this make sense? Note that pagerank is a relative weight among pages and it makes sense only if the pagerank is calculated from a matrix of interconnected backlinks. so if supplemental pages have true pr, then they have to be included in googles pagerank calculation. why would google do this if the supplementals are accessed only if there are not enough results in the search against the main index. also supplemental pages never get updated. google must be smarter than this. and it would seem to be against the purpose of the supplemental index.
So what is the explanation? looks like when google transferred the page to the supplemental it transferred the page record lock-stock-and barrel. this includes whatever pr value was stored at the time. i'll monitor this and see if the pageranks of the supplemental pages get updated when google does a pr update.
i have no tangible evidence whatsoever, just pure logic!
renee, given that nobody else here has any tangible evidence, I'll go with the pure logic.
<<<<looks like when google transferred the page to the supplemental it transferred the page record lock-stock-and barrel.
And since your logic actually explains the things I'm seeing, like in this case, where I rebranded a site, it entered the sandbox, lost all its pagerank. But then one day I was rechecking the page rank, when a single lone file suddenly showed up with its old pagerank. That vanished later. This file was buried in the site, and was the only file with page rank on the site. So your logic perfectly explains this phenomena. [given that creating an algo is pretty much a purely logical process, obviously one of the best tools to decipher/reverse engineer it is logic, that's sort of a no brainer, or should be... but my college logic teacher told me he'd seen a drastic decline in his student's abilities to perform logic with the onslaught of the tv generations....]
[edited by: isitreal at 3:21 pm (utc) on Oct. 5, 2004]