Welcome to WebmasterWorld Guest from 126.96.36.199
No more sandbox crap talk again please. New sites actually get a boost in the new ranking algo of Google, but if they are not up to it, they fall deep down in SERPs.
Can I refer to the following response from Googleguy in a Bourbon update thread to a very well considered question from mblair:
"Are there any rules of thumb, other than Google's webmaster guidelines, that can help a new website get balanced consideration by Google for good rankings in the SERPs or is it prudent for the webmaster to plan for a neccesary passage of time before a website has a potential to rank well?"
"mblair, it usually does take time for a site to build up its reputation. There are always going to be a few sites that are so good or so viral that they don't really need search engines at all. In the old days, the hamster dance swept through the known internet world."
A site I am working on came out of the "sandbox" with the Bourbon update and pages started ranking for competitive keyphrases. What I now notice is that there appears to be a "sandbox" hangover. New pages added to the site since the Bourbon update are ranking far better relatively to pages that were "sandboxed" for 6 months.
Some people get good rankings from the start and then laugh about the sandbox theory, but actually they don't understand about the temporary boost for new sites and aren't aware where they are heading to in the next couple of months. The longest temporary boost of new sites that I accidentally noticed took about three months which was quite unsual.
Your situation is very common, IMO. Google appears to require some historical data before the sites appear in ranking (different from initial but temporary boost). You may want to call it a "sandbox" theory.
It defies logic to me. For the top 50 positions on any competitive search there will be thousands of new pages being made and hundreds of new sites everyday. Unless you get into the top 50, I doubt you will get any significant traffic. It is impossible for all these new pages/sites to get a decent ranking at the same time!
What is the definition of "sandbox" and "sandboxed" to you?
I don’t think you will ever get a consensus on that. However, if your site gets a cache date that is never more than 10 days old, (regularly crawled) and the title, snippet and url appears just as you intended when you search by your url, and you run a search using the commands; allinanchor:, allintext:, allintitle: and your site comes up within the first few pages, BUT when you search for the main keyword the site was designed for and your not within the first 1,000 places you could throw some sort of label on that.
p.s. the politically correct term these days is “filter” but I tend to think of it more along the lines of I just haven’t knocked on the right door yet.
If there were a "sandbox filter" then it will not "[catch] up ... leter" as such things are not spiders that run on their own on the web! If there was one, then it applies at the exact time the site enters the index, because the date of retrieval is stored in the index with the document itself. So, if such penalty existed, it will apply instantly at the time the site is indexed and no "catching up" is logical in the first place.
If you're trying to find a reason behind your observations of initial ranking for new sites which drops soon, then think of something else.
It does not need a rocket scientist to figure that out! It's even nothing behind the basics of how search engines work. (A spider spiders web links, page contet and info is retrieved of links in spidered pages, such info is stored in a way to facilitate easy retrieval together with some calculated metricies, when search is done links to pages are retrieved from the index based on a matching algo that takes into account the search query as well as the ranking of the pages retrieved.) That's how all search engines work, nothing secret about it! It's there all over the net, just read any primer about how search engines function.
We all know atomic bombs are made of uranium, that does not mean we do know how one exactly is created. Same with search engines, it's common knowledge how they generally work, but it's only the detailed specifics which are kept hidden and are a secret.
As you see, the spider's work is to crawl the links on the web, this is the part of the system that can hit or miss your pages, no other part of the system discovers your pages other than the spider. As for the metrics that are stored with the content of your web page, they are calculated and stored with the content.
There is a difference between the common way search engines work (such as all cars use wheels, but each has its own mechanism) and the specifics that differentiate one search engine from another. What I was talking about was the general things that all search engines use.
Site C is 1 month old, has only a single (index page), and indexed by G.
It seems that one can find experiential evidence both for and against the "sandbox" theory. Perhaps there are some factors that we are unaware.
An interesting side note: Site A has a PR3 (1 year old, with some dozens of IBLs, site B has a PR2 (6 months old, with just a single IBL, and site C has a PR3 (with just a single index page, and a single IBL from a page with a PR3).
[edited by: selomelo at 9:00 pm (utc) on Aug. 4, 2005]
then it applies at the exact time the site enters the index
I wonder how a heuristic ruleset works into all of this?
I wonder if batch processing when adding to an 8bil page index makes more sense than individual page processing?
I wonder if it takes time to process the new URL for inbound link information, associate it with a domain, compare it for duplication within the domain, compare it to other existing pages from other sites with a similar foot print, compile historical data, compile information regarding links out, begin tracking click and other user behavior patterns, on, and on and if while this is happening Google gives you the benifit of the doubt, until the page is returned with no pattern or history?
I wonder if patterns play a key role in the whole sandbox theory?
I wonder if maybe you have oversimplified the storage, retrieval, processing and application of a 5,000,000 variable ruleset just a little?
I wonder if the fact that G uses techniques to get the information *out* to the end user faster, has any bearing on how fast they can process information on the way in - and if there might be a lag from the time of indexing to the final inital ranking because of this?
Don't get me wrong, what you are stating is plausible - I just have a few questions that need to be dealt with before I could buy into it... Maybe you can help me out?
I agree with you that patterns are the way to go and what is used for search engines, and can be used by clever SEOs too, but that's another story for another day (or thread).
i've intentionally "oversimplified" the workings of search engines, to make it clear for some that such broad methods are common in search eninges. as for the details, they are search engine specific. so the oversmiplification was clearly intended and I made no attempt to say I was covering anything but the "basics" as I've mentioned in my last post.
i like your language, so i'll borrow it
I wonder if Google does not make pages it has seen availabel to searchers BEFORE it has calculated their metrics. Come on, this one is a no brainer. Do you expect Google (or just about any other search engine) to return to searchers links to pages that it has not calculated their metrics yet?
No 'wonder' [put_any_conclusions_here]
But do not take my word for it. Even though it sounds so logical to use different indexing frequencies for different web pages (and sounds too dumb for a search engine to waste its resources not doing that), yet try this as an imperical proof:
Go to some popular continually frequently updating page (such as an popular active, a news site, an acctive community site/forum) check the cashed page of this forum or site. See when was the last date and time it was indexed and even check the contents of the cashed version to compare them with the new live version.
Now do the same thing but for a different less popular and much less updated site, you will notice that the date of indexing is much longer. So now you have imperical evidence to back up the assumption which is easily concluded in the first place that search engines index pages on the web using different frequencies and not a constant one.
How Google guesses how often each document is updated is another story, perhaps by it's historic update pattern, or maybe partly using that. Anyway, Google's launch of the Google Sitemap initiative aims at making life easier for its spiders and an experiment into enabling it to use its resources more efficiently crawling the web. The sitemap should include the addresses of all web pages on the site that need to be indexed and Google has designed it also to include an optional field in which a webmaster can specify how often each document is being updated (weekly, daily, hourly ...). So, still think Google indexes pages on the web using a constant indexing frequency that does not change from a web page to another?
As for MSN, I've noticed it brings me also more traffic than Google for my 9-month old site. I don't know the reasons behind that, but it downed upon me while reading in this thread that this could be a result of MSN not having enough historical data about web pages like does Google due to it starting late and thus is not really able to calculate the history of web pages and sites and their development over time as good as Google is able to due to its starting much earlier (and covering MUCH more web pages.)
Can anyone show from their logs an example of a new site getting traffic from google and then trailing off, in the first week or so of its launch?
I suspect what people are seeing is a site in the sandbox getting a little traffic as normal and then being hit with a different penalty.... perhaps too many links in too quickly :) This may give the illusion of not initially being sandboxed, but the reality is that you were sandboxed and things just got worse!
Google indexes a new page/site, determines the content of a page/site, and stamps it with a logrythmic value. This step adds the page/site to the index.
Google uses techniques to get information to the end user faster, rather than to give them the #1 answer - heuristic ruleset.
The page/site shows in the index, because the page/site has not been fully compared with other page/sites with a similar logrythmic stamp and does not have any filtering stamps applied to it's value. Again their goal - information to the end user - in the absence of a filtering stamp, all pages/sites rank based on their on page/site factors, until other factors are added to the information associated with the page/site.
Google batch processes all new pages/sites against previously indexed pages/sites, which have a similar logrythmic value assigned.
The page/site ranks for a period of time because of on page/site values and the lack of a full 'filter' stamp that is applied during the batch processing of similar pages/sites.
The page/site is then moved to the appropriate place in the index, based on the stamps associated with it's value during the comparrison stage of the indexing process.
Pages/sites rank until they are fully compared, but after the full comparrison is made many cease to rank, because they do not fit the appropriate profile once the full filtering information is applied.
I generally learn more when I look openly for answers to questions rather than seeking facts to back-up a specific preconceived idea.
Wikipedia has a nice definition of heuristic (GG spells it huristic, but I believe since the application is the same, the definition is applicable.)
This should be more or less closer to the right definition:
"sandboxed" is a delay Google actively puts on any new link pointing to a particular external page and need more discovery prior to giving that destined page a ranking and/or positioning.
and thus will only do well on non competitive searches. It takes more than on page factors to get anywhere near the top 100 on a competitive search.
Even if Google takes a few weeks to put a new site into the sandbox, it will not rank for serious sector searches and is as good as sandboxed.
This makes sense to me. We all know that weird search terms can generate a significant amount of easy traffic and be easy to rank for. A new site may pick these up before entering the sandbox. It could be the case that a new site goes into the main index and then gets moved into the 'supplemental' index. This will mean that unless there are very few qualifying sites in the main and primary index, they will never rank well.
Is the sandbox in fact a name for sites in the Supplimental index?
I like the theory that there are 3 indexes. The final port of call for a search term is the 3rd index which also carries 'supplimental pages'. A site in the 3rd index will never appear above a site in the other index's for a relevant search term. This index is subject to very stringent filters and algo's, making it very difficult to get out of. This would allow Google to be able to allocate these special filters to only a part of its overall database, thus minimising processing time. If a site has already qualified for the 1st or 2nd index, there is no need to keep examining it for 'natural link growth'. In fact, because of scraper directories etc., sites in the main index will always have strange links appearing to them so it would be counter productive to assume this is suspicious. However, a new site acquiring loads of instant links is worth spotting.
Any page/site could be downgraded into the 3rd index, but all new sites automatically go in. Google is in no hurry to promote sites from this index and the filters applied are very demanding.