Forum Moderators: open
I feel very strongly that until we have a good grasp on why it exists, it will be very hard to beat.
I don't buy the explanation that it's intended to be a method of stopping spam. Why? One, there's too much collateral damage it is doing. Two, if you accept the 80/20 principle (20% of spammers are doing 80% of the spamming), and you realize that there are multiple ways already of beating the sandbox that all of those spammers are aware of, it doesn't make sense anymore.
So, why does the sandbox exist?
The most obvious effect of the sandbox is that it prevents new domains (not pages) from ranking for any relatively competitive term. So, start thinking like a search engine - what would be the benefit of this?
The evidence is scarce and Google has been page oriented for as long as anyone can remember but...
As several of my sites gained links and prominence, they began to generate traffic on related keywords, but not my primary targeted keywords. The related keywords were usually targeted on an inner page. As the sites gained more prominence, they began to generate traffic on both the related and the primary keywords. It seems to take numerous spidering/indexing cycles for all of this to settle, a sandbox?
Although changes to an established page are spidered and indexed promptly, Google seems to take a month (and often longer) to reflect ranking changes, whether the changes are for the better or worse.
When searching, I routinely set preferences to display 100 results per page and, in my experience, indented results invariably support a site's theme. My experience is that post-Florida, changes to the page displayed as an indented result affect the main page's ranking.
I'm thinking Google added a "site theme" aspect to their algo with Florida. I believe it is a bolt on, after the fact, post spidering/indexing thing, that is generated and/or applied after several months of spidering/indexing. They're taking their sweet time identifying a theme and, until they do, no ranking prominence...
Google has been page oriented for so long that it's difficult to imagine them considering the totality of a site but I think that's what I'm seeing. It's as if they build a score from the bottom up, from the inner pages to the home page, THEN award an "on theme score." And they do this over months of spidering...
OK, like most of you, I'm theorizing about what the "Google Lag" is and not addressing Jake's original question, "Why does the 'Google Lag' exist?" The answer to that question is a simple one. It exists to thwart SEOs and their manipulation of Google's index. :)
The answer to that question is a simple one. It exists to thwart SEOs and their manipulation of Google's index.
With respect, I think not. If I were in charge at the 'plex and I asked my people to come up with something to thwart SEOs and this was the result I would sack the person responsible.
Remember that "Google's mission is to organize the world's information and make it universally accessible and useful."
You don't do that by excluding all new sites from the results for a period of eight months or more. I still think that it may be a fault and it's existence should be publicised to force them into a comment. Doesn't anyone have the influence to get it into the press? Brett?
I don't think so. There isn't a good correlation there. If a way to avoid lag time was: get 1000+ links from different IP/unrelated domains... what would that tell us about why other sites are lagged? It would tell us something, but it wouldn't tell us why sites with 943 unrelated links are lagged; or why 1001 guestbook links would beat the lasg but 999 links from the very best domains in the galaxy wouldn't.
I can tell you how to beat a 7'4" whiteboy center to the hoop, but I can't tell you why the 7'4" whiteboy exists.
Sorry Boaz, but new sites do not invariably go into the sandbox.
Yes, not all new sites go into the sandbox.
People would do better to think of it as an algo, like Florida on steoroids, with tightened dupe filters also
I tend to agree except it is pretty obvious that age is a significant factor in this algo. I actually have pretty solid proof of this but am not at liberty to share the research here.
From what I see, and we done extensive research, very few exisitng sites were affected by the algo change which started in the early spring. The exception is, of course, huge datafeed sites. Also, there was a period in the spring where sites were popping out of the sandbox after a couple of months as though there was a holding period. So, it is quite interesting to see older sites with very similar attributes to newer sites rank on key terms while the newer sites seem to never really catch on.
If google is intending for this lag to exist, they really aren't helping their existing index in any way, as much of the same junk that was there in February is still there - it is a case of old vs. new junk I suppose :)
it is a case of old vs. new junk I suppose
I don't know about this. I have created about six or eight sites since this started. All of them are for clients who offer services as opposed to selling on line. None of them sell anything through the sites or carry any adverts and all of them provide information about the services they provide. Not junk - but still not featuring.
I could not agree with you more on this;
“You don't do that by excluding all new sites from the results for a period of eight months or more.”
It is astounding to me that from pre-IPO through post-IPO, this index, for all practical purposes, is not showing any new sites for coming up on a year now. A year, looked at in relation to the changes that go on in the internet, is an incredibly long time. I am an admitted Goolge fan but this fact is something that is a serious issue with their record.
I don’t know “why”, but every day it goes on I get closer to thinking it cannot be intentional, and they are struggling to fix it. Because eventually its going to get more play, and they have hung their hat on freshness, which this index is anything but.
But then again, this might just be another of my stupid ideas :-)
1. Have you had a site that has been sandboxed?
2. What do you think the symptoms are for the sandbox effect?
3. How many sites do you have which have been sandboxed?
4. In the last year, how many new sites have you developed?
5. What percentage of these sites were commercial/for profit?
6. What percentage of these sites sandboxed were commercial/for profit?
7. On average, at what rate did you add backlinks (per week) on sandboxed site(s)?
8. Are all your sites on the same IP block?
9. Are all your sites registered with the same registrar?
10. As far as you know, was the domain name of the sandboxed site new?
11. On average, how many hits do you get per month from googlebot on your sandboxed sites?
12. Does the sandboxed site use Adsense?
13. Is the sandboxed site listed in dmoz and yahoo?
14. Give ranking results for the following using keywords you think are unique to your sandboxed site:
a. allinurl:
b. allinanchor:
c. allintext:
d. allintitle:
15. Is home page of the sandboxed site cached by Google?
16. Number of results for link: on homepage of sandboxed site?
17. Was a development tool used to create the sandboxed site (ie. dreamweaver, frontpage, etc)?
18. What is the PR of home page of the sandboxed site?
19. Average PR of other pages of the sandboxed site?
20. How old is the sandboxed site?
21. Do you buy links for the sandboxed site?
22. Keyword/keyphrase density on home page of sandboxed site?
23. Keyword/keyphrase density on average for other pages of the sandboxed site?
24. How many new pages are added on average per week to the sandboxed site?
25. Do you use a database to generate pages on the sandboxed site?
26. Is your sandboxed site an affiliate site?
27. Do you post text from other sites (ie. newsfeeds, articles etc.) on your sandboxed site?
28. Currently,on average, how many pages are on your sandboxed site?
29. Have you had a site that was taken out of the sandbox?
30. If so, how many days was the site in the sandbox?
31. What is the average PR of backlinks for the sandboxed site?
32. Was the site that was sandboxed new?
yes you are right. I did go back and found several of my supplemental pages with pagerank using the google toolbar. Does this make sense? Note that pagerank is a relative weight among pages and it makes sense only if the pagerank is calculated from a matrix of interconnected backlinks. so if supplemental pages have true pr, then they have to be included in googles pagerank calculation. why would google do this if the supplementals are accessed only if there are not enough results in the search against the main index. also supplemental pages never get updated. google must be smarter than this. and it would seem to be against the purpose of the supplemental index.
So what is the explanation? looks like when google transferred the page to the supplemental it transferred the page record lock-stock-and barrel. this includes whatever pr value was stored at the time. i'll monitor this and see if the pageranks of the supplemental pages get updated when google does a pr update.
i have no tangible evidence whatsoever, just pure logic!