Sandbox Myth

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Sandbox Myth

Reverse sandbox effect

shafaki

3:34 pm on Aug 2, 2005 (gmt 0)

Still people are talking about the 'sandbox' till today! There is not such thing as sandbox any more. In the past, a search engine used to 'penalize' a site for it being new. This kind of 'descrimination' against new sites was the norm and only the older more established sites got the high recognition from Google and others. It has been a while now since Google realized this was hurting its searches and has introduced new tweaks to its ranking algo to counteract this 'descrimination'. Now, new sites actually get kind of a boost once they are first indexed. After an initial test, Google sees if that boost is well deserved or not. If, according to search patterns by searchers, Google finds that indeed the site deserves good ranking, it preserves the ranking and may even increase it with time. If however the site turned out to be a failure as determined by search patterns, then Google dumps it low in its search results after having given it its initial chance.

No more sandbox crap talk again please. New sites actually get a boost in the new ranking algo of Google, but if they are not up to it, they fall deep down in SERPs.

wordy

7:45 pm on Aug 3, 2005 (gmt 0)

Hmmm...another "sandbox" thread.

Can I refer to the following response from Googleguy in a Bourbon update thread to a very well considered question from mblair:

mblair:
"Are there any rules of thumb, other than Google's webmaster guidelines, that can help a new website get balanced consideration by Google for good rankings in the SERPs or is it prudent for the webmaster to plan for a neccesary passage of time before a website has a potential to rank well?"

GG reply:
"mblair, it usually does take time for a site to build up its reputation. There are always going to be a few sites that are so good or so viral that they don't really need search engines at all. In the old days, the hamster dance swept through the known internet world."

A site I am working on came out of the "sandbox" with the Bourbon update and pages started ranking for competitive keyphrases. What I now notice is that there appears to be a "sandbox" hangover. New pages added to the site since the Bourbon update are ranking far better relatively to pages that were "sandboxed" for 6 months.

shafaki

10:16 pm on Aug 3, 2005 (gmt 0)

"'sandbox' hangover." lol, i love that, i love your wording wordy ..

sit2510

6:36 am on Aug 4, 2005 (gmt 0)

>>> If you are achieving good rankings from the start, well done you.... you are doing it right and winning the seo game.

Some people get good rankings from the start and then laugh about the sandbox theory, but actually they don't understand about the temporary boost for new sites and aren't aware where they are heading to in the next couple of months. The longest temporary boost of new sites that I accidentally noticed took about three months which was quite unsual.

sit2510

6:45 am on Aug 4, 2005 (gmt 0)

>>> I launched a site 3/4 months ago and while it is now ranking rather well on other well known search engines it is no where to be seen on G.
Is that evidence to back up a sandbox theory?

Your situation is very common, IMO. Google appears to require some historical data before the sites appear in ranking (different from initial but temporary boost). You may want to call it a "sandbox" theory.

MHes

9:03 am on Aug 4, 2005 (gmt 0)

I don't see this 'new site boost'. This may sometimes happen with new pages on an established site (that previously ranked well anyway) which after a few days seem to drop a bit on competitive searches, but I have had no experience of new sites getting a boost.

It defies logic to me. For the top 50 positions on any competitive search there will be thousands of new pages being made and hundreds of new sites everyday. Unless you get into the top 50, I doubt you will get any significant traffic. It is impossible for all these new pages/sites to get a decent ranking at the same time!

BeeDeeDubbleU

9:50 am on Aug 4, 2005 (gmt 0)

I have experienced this temporary ranking phenomena (boost is probably the wrong word). I think what may have been happening is that new sites were indexed and ranked within a few days as normal. They took their natural place in the rankings until the sandbox filter caught up with them a few days later and they were then dropped to oblivion for a few months before gradually coming back. A site that ranked for three months before dropping was probably not SB'd.

sit2510

10:41 am on Aug 4, 2005 (gmt 0)

>>> (boost is probably the wrong word).

I think you are right - "boost" is the wrong word as it conveys the wrong message that Google would give every new site that favor which is not the case. Next time I will use the word "temporary ranking". Thanks!

sit2510

10:45 am on Aug 4, 2005 (gmt 0)

>>> 'new site boost'.

It is "new site temporary ranking". :)

>>> This may sometimes happen with new pages on an established site (that previously ranked well anyway)

Not necessary new pages on an established site, but new pages on the new site including homepage.

Tapolyai

4:23 pm on Aug 4, 2005 (gmt 0)

Ok... Maybe we are just playing with words.

What is the definition of "sandbox" and "sandboxed" to you?

From what I understood from reading here, "sandboxed" is a delay Google actively puts on new sites that need more discovery prior to giving them a ranking and/or positioning.

randle

5:18 pm on Aug 4, 2005 (gmt 0)

What is the definition of "sandbox" and "sandboxed" to you?

I don�t think you will ever get a consensus on that. However, if your site gets a cache date that is never more than 10 days old, (regularly crawled) and the title, snippet and url appears just as you intended when you search by your url, and you run a search using the commands; allinanchor:, allintext:, allintitle: and your site comes up within the first few pages, BUT when you search for the main keyword the site was designed for and your not within the first 1,000 places you could throw some sort of label on that.

p.s. the politically correct term these days is �filter� but I tend to think of it more along the lines of I just haven�t knocked on the right door yet.

BeeDeeDubbleU

5:43 pm on Aug 4, 2005 (gmt 0)

Well defined Randle ;)

shafaki

6:30 pm on Aug 4, 2005 (gmt 0)

<quote>... the sandbox filter caught up with them a few days later and they were then dropped to oblivion for a few months before gradually coming back</quote>

If there were a "sandbox filter" then it will not "[catch] up ... leter" as such things are not spiders that run on their own on the web! If there was one, then it applies at the exact time the site enters the index, because the date of retrieval is stored in the index with the document itself. So, if such penalty existed, it will apply instantly at the time the site is indexed and no "catching up" is logical in the first place.

If you're trying to find a reason behind your observations of initial ranking for new sites which drops soon, then think of something else.

BeeDeeDubbleU

7:46 pm on Aug 4, 2005 (gmt 0)

If there was one, then it applies at the exact time the site enters the index, because the date of retrieval is stored in the index with the document itself.

Shafaki you seem so positive. How do you know all this? Where do you get your inside information? ;)

shafaki

8:37 pm on Aug 4, 2005 (gmt 0)

It does not need a rocket scientist to figure that out! It's even nothing behind the basics of how search engines work. (A spider spiders web links, page contet and info is retrieved of links in spidered pages, such info is stored in a way to facilitate easy retrieval together with some calculated metricies, when search is done links to pages are retrieved from the index based on a matching algo that takes into account the search query as well as the ranking of the pages retrieved.) That's how all search engines work, nothing secret about it! It's there all over the net, just read any primer about how search engines function.

We all know atomic bombs are made of uranium, that does not mean we do know how one exactly is created. Same with search engines, it's common knowledge how they generally work, but it's only the detailed specifics which are kept hidden and are a secret.

As you see, the spider's work is to crawl the links on the web, this is the part of the system that can hit or miss your pages, no other part of the system discovers your pages other than the spider. As for the metrics that are stored with the content of your web page, they are calculated and stored with the content.

There is a difference between the common way search engines work (such as all cars use wheels, but each has its own mechanism) and the specifics that differentiate one search engine from another. What I was talking about was the general things that all search engines use.

selomelo

8:57 pm on Aug 4, 2005 (gmt 0)

I have 3 sites, A, B and C, all left for aging and future development.
Site A is 1 year old now. It has a few pages (10 or so), some relevant content, and never disappeared from the G SERPs.
Site B is 6 months old, has a few pages (10 or so), and some relevant content. Initially, all pages indexed by G, did a good performance in G SERPs for a while, but all of a sudden it disappeared all together after the Bourbon. Now, only the index page shows up.

Site C is 1 month old, has only a single (index page), and indexed by G.

It seems that one can find experiential evidence both for and against the "sandbox" theory. Perhaps there are some factors that we are unaware.

An interesting side note: Site A has a PR3 (1 year old, with some dozens of IBLs, site B has a PR2 (6 months old, with just a single IBL, and site C has a PR3 (with just a single index page, and a single IBL from a page with a PR3).

[edited by: selomelo at 9:00 pm (utc) on Aug. 4, 2005]

jd01

8:58 pm on Aug 4, 2005 (gmt 0)

then it applies at the exact time the site enters the index

I wonder how a heuristic ruleset works into all of this?

I wonder if batch processing when adding to an 8bil page index makes more sense than individual page processing?

I wonder if it takes time to process the new URL for inbound link information, associate it with a domain, compare it for duplication within the domain, compare it to other existing pages from other sites with a similar foot print, compile historical data, compile information regarding links out, begin tracking click and other user behavior patterns, on, and on and if while this is happening Google gives you the benifit of the doubt, until the page is returned with no pattern or history?

I wonder if patterns play a key role in the whole sandbox theory?

I wonder if maybe you have oversimplified the storage, retrieval, processing and application of a 5,000,000 variable ruleset just a little?

I wonder if the fact that G uses techniques to get the information *out* to the end user faster, has any bearing on how fast they can process information on the way in - and if there might be a lag from the time of indexing to the final inital ranking because of this?

Just wondering...

Don't get me wrong, what you are stating is plausible - I just have a few questions that need to be dealt with before I could buy into it... Maybe you can help me out?

Justin

texasville

9:08 pm on Aug 4, 2005 (gmt 0)

Justin-
I think you about have it there. It takes about a week to ten days for google to index pages after they have been crawled. (sometimes faster but I think this is the norm). If google then takes time to crawl thru the web and index everything and then compare and then crawl back to the new site and filter it, this could take some time. That's why you aren't sandboxed right away but takes a some time and voila!...the filter is applied or tripped.

zeus

9:42 pm on Aug 4, 2005 (gmt 0)

I do believe there is some kind of filter for new sites, I got a site which rank very well all over MSN, from day one it was indexed, but on google I recieve 7-8 unique visits a day out of 4000 unique a day and trust me I know what Im doing and there is somekind of filter, if its just for new sites or what they go for I dont know, but I do know if there was not that filter google would give double the 4000 unique.

shafaki

11:10 pm on Aug 4, 2005 (gmt 0)

dear I_wonder, or i mean jd01 ..

I agree with you that patterns are the way to go and what is used for search engines, and can be used by clever SEOs too, but that's another story for another day (or thread).

i've intentionally "oversimplified" the workings of search engines, to make it clear for some that such broad methods are common in search eninges. as for the details, they are search engine specific. so the oversmiplification was clearly intended and I made no attempt to say I was covering anything but the "basics" as I've mentioned in my last post.

i like your language, so i'll borrow it

I wonder if Google does not make pages it has seen availabel to searchers BEFORE it has calculated their metrics. Come on, this one is a no brainer. Do you expect Google (or just about any other search engine) to return to searchers links to pages that it has not calculated their metrics yet?

No 'wonder' [put_any_conclusions_here]

shafaki

11:27 pm on Aug 4, 2005 (gmt 0)

Google does not index different web pages on the web on a fixed frequency (this would be a very dumb thing to do and a waste of resources). Google sets a different frequency to index each page on the web. I tries to better manage its resources to strike a balance between maintaining a fresh index and making best use of its resources to cover a wider portion of the web (aka more web pages). It's not logical that a news site would be indexed in the same frequency as a reference site for instance. Check out Google News, the service by Google that automatically clusters news bits from news sources accross the web and is updated every 15 minutes or so.

But do not take my word for it. Even though it sounds so logical to use different indexing frequencies for different web pages (and sounds too dumb for a search engine to waste its resources not doing that), yet try this as an imperical proof:

Go to some popular continually frequently updating page (such as an popular active, a news site, an acctive community site/forum) check the cashed page of this forum or site. See when was the last date and time it was indexed and even check the contents of the cashed version to compare them with the new live version.

Now do the same thing but for a different less popular and much less updated site, you will notice that the date of indexing is much longer. So now you have imperical evidence to back up the assumption which is easily concluded in the first place that search engines index pages on the web using different frequencies and not a constant one.

How Google guesses how often each document is updated is another story, perhaps by it's historic update pattern, or maybe partly using that. Anyway, Google's launch of the Google Sitemap initiative aims at making life easier for its spiders and an experiment into enabling it to use its resources more efficiently crawling the web. The sitemap should include the addresses of all web pages on the site that need to be indexed and Google has designed it also to include an optional field in which a webmaster can specify how often each document is being updated (weekly, daily, hourly ...). So, still think Google indexes pages on the web using a constant indexing frequency that does not change from a web page to another?

As for MSN, I've noticed it brings me also more traffic than Google for my 9-month old site. I don't know the reasons behind that, but it downed upon me while reading in this thread that this could be a result of MSN not having enough historical data about web pages like does Google due to it starting late and thus is not really able to calculate the history of web pages and sites and their development over time as good as Google is able to due to its starting much earlier (and covering MUCH more web pages.)

MHes

12:04 am on Aug 5, 2005 (gmt 0)

If a new site ranks normally for a few days before sandbox kicks in, on what merit does it get those rankings? There will be few links in showing so does this mean ranking is not very dependent on links in?

Can anyone show from their logs an example of a new site getting traffic from google and then trailing off, in the first week or so of its launch?

I suspect what people are seeing is a site in the sandbox getting a little traffic as normal and then being hit with a different penalty.... perhaps too many links in too quickly :) This may give the illusion of not initially being sandboxed, but the reality is that you were sandboxed and things just got worse!

jd01

12:50 am on Aug 5, 2005 (gmt 0)

I am suprized you only picked on one of my points and then went off on a tanget... Interesting.

Another theory:

Google indexes a new page/site, determines the content of a page/site, and stamps it with a logrythmic value. This step adds the page/site to the index.

Google uses techniques to get information to the end user faster, rather than to give them the #1 answer - heuristic ruleset.

The page/site shows in the index, because the page/site has not been fully compared with other page/sites with a similar logrythmic stamp and does not have any filtering stamps applied to it's value. Again their goal - information to the end user - in the absence of a filtering stamp, all pages/sites rank based on their on page/site factors, until other factors are added to the information associated with the page/site.

Google batch processes all new pages/sites against previously indexed pages/sites, which have a similar logrythmic value assigned.

The page/site ranks for a period of time because of on page/site values and the lack of a full 'filter' stamp that is applied during the batch processing of similar pages/sites.

The page/site is then moved to the appropriate place in the index, based on the stamps associated with it's value during the comparrison stage of the indexing process.

Pages/sites rank until they are fully compared, but after the full comparrison is made many cease to rank, because they do not fit the appropriate profile once the full filtering information is applied.

Justin

I generally learn more when I look openly for answers to questions rather than seeking facts to back-up a specific preconceived idea.

Wikipedia has a nice definition of heuristic (GG spells it huristic, but I believe since the application is the same, the definition is applicable.)

shafaki

2:01 am on Aug 5, 2005 (gmt 0)

jd01

[www-db.stanford.edu...]

and now:

[appft1.uspto.gov...]

jd01

4:25 am on Aug 5, 2005 (gmt 0)

I've had them bookmarked for months - taken extensive notes on both multiple times.

Still good information for all who would like to join in our discussion.

Justin

Isn't speculation fun =)

sit2510

4:46 am on Aug 5, 2005 (gmt 0)

>>> What is the definition of "sandbox" and "sandboxed" to you? From what I understood from reading here, "sandboxed" is a delay Google actively puts on new sites that need more discovery prior to giving them a ranking and/or positioning.

This should be more or less closer to the right definition:

"sandboxed" is a delay Google actively puts on any new link pointing to a particular external page and need more discovery prior to giving that destined page a ranking and/or positioning.

sit2510

4:58 am on Aug 5, 2005 (gmt 0)

>>> If a new site ranks normally for a few days before sandbox kicks in, on what merit does it get those rankings?

Because the merit of inbound links have not been calculated, it is likely to be the on-page factor.

MHes

9:09 am on Aug 5, 2005 (gmt 0)

> it is likely to be the on-page factor.

and thus will only do well on non competitive searches. It takes more than on page factors to get anywhere near the top 100 on a competitive search.

Even if Google takes a few weeks to put a new site into the sandbox, it will not rank for serious sector searches and is as good as sandboxed.

sit2510

9:30 am on Aug 5, 2005 (gmt 0)

LOL, you got it ;)

MHes

12:02 pm on Aug 5, 2005 (gmt 0)

This delay of applying sandbox, if it exists, is therefore not an intentional boost for a new site but merely a site getting non competitve traffic for a few days/weeks.

This makes sense to me. We all know that weird search terms can generate a significant amount of easy traffic and be easy to rank for. A new site may pick these up before entering the sandbox. It could be the case that a new site goes into the main index and then gets moved into the 'supplemental' index. This will mean that unless there are very few qualifying sites in the main and primary index, they will never rank well.

Question:

Is the sandbox in fact a name for sites in the Supplimental index?

I like the theory that there are 3 indexes. The final port of call for a search term is the 3rd index which also carries 'supplimental pages'. A site in the 3rd index will never appear above a site in the other index's for a relevant search term. This index is subject to very stringent filters and algo's, making it very difficult to get out of. This would allow Google to be able to allocate these special filters to only a part of its overall database, thus minimising processing time. If a site has already qualified for the 1st or 2nd index, there is no need to keep examining it for 'natural link growth'. In fact, because of scraper directories etc., sites in the main index will always have strange links appearing to them so it would be counter productive to assume this is suspicious. However, a new site acquiring loads of instant links is worth spotting.

Any page/site could be downgraded into the 3rd index, but all new sites automatically go in. Google is in no hurry to promote sites from this index and the filters applied are very demanding.

zeus

1:26 pm on Aug 5, 2005 (gmt 0)

Is the sandbox in fact a name for sites in the Supplimental index? NO

This 96 message thread spans 4 pages: 96

Sandbox Myth

Reverse sandbox effect

shafaki

wordy

shafaki

sit2510

sit2510

MHes

BeeDeeDubbleU

sit2510

sit2510

Tapolyai

randle

BeeDeeDubbleU

shafaki

BeeDeeDubbleU

shafaki

selomelo

jd01

texasville

zeus

shafaki

shafaki

MHes

jd01

shafaki

jd01

sit2510

sit2510

MHes

sit2510

MHes

zeus

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week