Google's New Caffeine Engine - the dev version is now public - Google Search and SEO forum at WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google's New Caffeine Engine - the dev version is now public

tedster

12:48 am on Aug 11, 2009 (gmt 0)

Secretly, they’ve been working on a new project: the next generation of Google Search. This isn't just some minor upgrade, but an entire new infrastructure for the world’s largest search engine...Google's now confident enough in the new version of its search engine that it has released the development version for public consumption. While you won't see too many differences immediately, let us assure you: it's a completely upgraded Google search.
Article at Mashable [mashable.com]

The url for the development version is [www2.sandbox.google.com...]

cangoou

8:00 am on Aug 13, 2009 (gmt 0)

Are there any hints when Caffein will come out of the playground and go online on google.com? Are we talking about weeks or month?

tedster

8:03 am on Aug 13, 2009 (gmt 0)

From what Matt says on the video, I'd expect it to be a few months.

cangoou

8:23 am on Aug 13, 2009 (gmt 0)

Ups, still working on it:

Please try your search again in a few hours
We are upgrading elements of our data center. The Caffeine sandbox should be available for searching again in a few hours.

So summing-up, the video says that caffein is most a change in how Google is indexing things, and according to Matt the SERPs should (almost) stay the same.

Hissingsid

8:56 am on Aug 13, 2009 (gmt 0)

While they are rebooting the system I'd like to know what Johannes Henkel said about site architecture.

Cheers

Sid

steveb

9:02 am on Aug 13, 2009 (gmt 0)

"Please try your search again in a few hours"

Getting that Caffeine warning on regular Google now too, as well as a wonky toolbar...

sem4u

9:10 am on Aug 13, 2009 (gmt 0)

I am getting the same message when I try and access Caffeine.

HuskyPup

10:39 am on Aug 13, 2009 (gmt 0)

Caffeine?...Really, they should've called it espresso:-)

Donna

12:02 pm on Aug 13, 2009 (gmt 0)

The question is , do they have all filters and penalties applied already at the sandbox.

Shaddows

12:50 pm on Aug 13, 2009 (gmt 0)

I'll stick my neck out, and say "yes".

I think they have reworked their algo-penalites. Some collateral damage has been negated, some borderline cases reviewed and some weightings reworked.

I say this, because the SERPs are NOT full of SPAM, nor full of sites that are not normally there. From this, I deduce that most quality control mechanisms are in place.

The fact that some penalties and filters have been removed suggests to me that the process behind them has been modified such that some instances no longer match the criteria.

Indeed, this is as I would expect for a Release Candidate for a major infrastructural overhaul.

eiilers

1:11 pm on Aug 13, 2009 (gmt 0)

I would agree with Shaddows on that one. I think penalities are still in place, but have been reworked.

Hissingsid

1:27 pm on Aug 13, 2009 (gmt 0)

It is very difficult to differentiate between considered/balanced evaluation and wishfull self interest.

In my niche there are a number of sites that used to be ranked near or at the top who had a penalty applied which was IMHO a deserved reaction to their activity. Those sites had returned the last time Caffeine was up. Now it is a matter of opinion as to whether they deserve to benefit from infrastructure changes but I would not be at all surprised to see them dropped when Caffeine comes back up, perhaps some of the reports they have received have made them realise they have missed something in the recipe and they are doing a data merge as we speak.

Only time will tell.

Cheers

Sid

whitenight

1:55 pm on Aug 13, 2009 (gmt 0)

It is very difficult to differentiate between considered/balanced evaluation and wishfull self interest.

My turn to speak on Shaddows behalf. ;)

If Shaddows is making an observation, it based on ALGO-WIDE analysis and not on personal "wishful thinking" of personal sites affected or not affected by that theory.

-------------

As I stated in the post to tedster, I would be wary of expecting another "big event" for Caffeine to be seen.

Typical MC FUD ... or classic magician's trick,
"Look over here at the right hand to distract you, while the left hand holds the bunny"

This infrastructure is ALREADY LIVE!

It may not be fully operational, but it is already live.
(see martinibuster's and my posts)

I'm already seeing "caffeinated datasets" slowly being worked into live DCs.

What do you think the "Long National Nightmare" Update over the past few month was working on?

You can "wait" for MC to point out a single "official caffeine DC" if you want, but you'll be missing the real show going on right now.

Shaddows

2:10 pm on Aug 13, 2009 (gmt 0)

Thanks whitenight. Implication I often speak for you duly noted

Indeed, I find it impossible to make any meaningful inferences from my neck of the woods- the main protagonists are heavily plageristic in terms of structure, content and technique.

On alighter note, if indeed Caffeine is being blended into normal SERPs, could the process be thought of as Percolation [en.wikipedia.org]?

Hissingsid

2:24 pm on Aug 13, 2009 (gmt 0)

Perhaps there's no geo location filter on the Caffeine results.

Just because some folks shout a little louder doesn't make them more right.

Cheers

Sid

whitenight

2:32 pm on Aug 13, 2009 (gmt 0)

On alighter note, if indeed Caffeine is being blended into normal SERPs, could the process be thought of as Percolation?

"It's time for the perculator, it's time for the percolator"
_{(lol, anyone who gets this reference spent too much time clubbing in college instead of studying)}

Yea, by the time the "official caffeine DC" is pointed out, that will be the "finished product"
(which would then quickly roll out to the other DCs)

Just because some folks shout a little louder doesn't make them more right.

Your choice to believe/test or not believe/test, Makes no difference to me. :)

signor_john

2:59 pm on Aug 13, 2009 (gmt 0)

Short answer, not quite, but close, however, the spidering thing will be a thing of the past except for authentication / authentification of ownership, site response (and a few other site pings). it's the webmasters who will do the data dumping on search engines servers, the other way around that is.

I'm skeptical, simply because the search engines want to index all useful content, not merely content that has been supplied by people who want a presence in the search engines. How many authors of academic papers, government documents, sites about local history or genealogy, or hobby sites have ever heard of a "Google sitemap" or a Yahoo urllist? Probably not many, compared to the number of commercial site owners and SEOs who use such tools and have a strong dollars-and-cents motive to get traffic from Google, Yahoo, or Bing.

Shaddows

3:18 pm on Aug 13, 2009 (gmt 0)

I'm skeptical, simply because the search engines want to index all useful content

Could not agree more. I would be stunned if Google reneged on its core philosophy of organising the worlds' information (or however that quote goes).

For my money, completeness beats marginal speed anytime.

Sid, I'm not trying to convince anyone of anything. But look at this thread- EVERYONE is saying the same thing; " SERPs are highly comparable". Do a bunch of side-by-sides on any subject that takes your fancy, its very similar.

All I'm saying is that is not a result that is consistant with QC measures being relaxed. The odd exception is consistant with architectural variation in the underlying process. Doubtless, the buttons will be pushed and knobs tweaked, and your competitor might disappear. But, to my eyes, QC (including penalties and filters) are in place.

freejung

8:35 pm on Aug 13, 2009 (gmt 0)

"How can a page with "no content at all" rank in the top 10?"

I'm sorry, I should have been more clear. I meant no _substantive_ content.

There are a couple of short paragraphs talking about what sort of content there is going to be in the future. Along the lines of, "Are you looking for blue widgets? Is there even any reason for there to be another blue widgets site on the web? This is the website that answers those questions. We will soon have lots of blue widgets for your enjoyment. We hope you will enjoy these blue widgets, which will be the coolest blue widgets anywhere."

tedster

10:41 pm on Aug 13, 2009 (gmt 0)

The odd exception is consistant with architectural variation in the underlying process...to my eyes, QC (including penalties and filters) are in place.

Agree completely. The main reason for this dev release is to get feedback from the webmaster community. Google is looking for those areas where something DOES fall into any new archetectural cracks, and their intention is for a seamless change-over.

[Here's a half-formed impression] I think I see evidence of greater flexibility - that is, the various result types and taxonomies can be blended in more ways. Certain SERP positions have been used for specific types of results - and on those queries, I now see certain types of results in surprising new positions.

For example, the use of position #4 doesn't look as restricted to me as it used to be. Nor does #6 - and possibly even #11. Time will tell if I am just hallucinating or if there is really something to it.

[edited by: tedster at 10:50 pm (utc) on Aug. 13, 2009]

willybfriendly

10:49 pm on Aug 13, 2009 (gmt 0)

I'm sorry, I should have been more clear. I meant no _substantive_ content.

Been going on for years. I am aware of a page with 39 words and 4 links that has ranked top 3 across several search terms for almost a decade now. This entirely on the strength of a huge marketing budget that included TV advertising, and the multitude of (blog) links so generated. And, in spite of the fact that it is well past its prime and offers little value anymore (if it ever did).

I see it as a great example of the weakness in G's algo.

tedster

10:53 pm on Aug 13, 2009 (gmt 0)

It's a weakness for any kind of machine-based "intelligence" and a sign that ANY kind of semantic processing is still in its infancy. It still takes a human mind to see "value" and not just the right variation of vocabulary... for now.

steveb

12:39 am on Aug 14, 2009 (gmt 0)

"do they have all filters and penalties applied already at the sandbox"

No chance. They don't even have all their negative score elements in place, so no doubt some of their penalties/filters are also not 100%.

"From this, I deduce that most quality control mechanisms are in place."

"Most" is not "all".

dusky

2:12 am on Aug 14, 2009 (gmt 0)

I'm skeptical, simply because the search engines want to index all useful content, not merely content that has been supplied by people who want a presence in the search engines

signor_john, I am no history lecturer, I just remember things. The thing is, all will be required to upload their data to central and regional "super servers" whether you have a site or not, and in time that will be the norm!

Nowadays, you upload any data to your own site, hosted on your own server or a webhost if you wanted people to access it through a browser, we did not have to do that before as the internet as it is now did not exist, we just used command line tools and communicated / showed or sent each other's data the old way, no browser, no DNS and IP to domain setup, no commercial or personal sites....but we were still "online" actually the true way. Someone comes along with the www idea, another window dressed PC DOS and released it as Windows, based on the latter's re-branded version (MSDOS) as they liked Steve Jobs et al Mac idea, others made the Unix like OS open source and called it Linux and of course others worked hard on commercial and open source SQL and webserver software etc. we don't do it the old way any more and we bought into the www idea!

Some of us resisted it, some, the very minority still resist it I might add.

Until you see in details (pre-alpha release at least) how the information is gathered, stored, analyzed and presented, you can't judge. I like innovators and innovation, without that, the world would stand still.

Searching for information is a serious thing now, and the sought information providers, mainly search engines have to do it right, just in time, accurate, free from spam, safe, secure...and all the rest of it. You enter a library, you go straight to the Science / Physics section, quick glance and you find the Stephen hawking book you wanted, that's basically how the Internet is going to look like. Yes, wolfram and the rest of his gang are part of the building block, it's not their search engine that's going to be the pioneer on that (sorry Stephen), they will nonetheless inspire others and may well be part of some of it, their search engine is no more than a scientific calculator fed with trillions of data lines and programmed to return answers according to their algorithmic calcs, the guys at ASK are doing better than that and their new version will see G* in even bigger panic, however, all are still sticking FOR NOW to the traditional methods!

What made G* is the sheer spin and free publicity, not their relevancy or accuracy, if ASK had just half of that spin and free pub, they would've been the largest SE right now as they HAD the most accurate answers of the 4 SEs (others being G*. Y! and Bing) before they started messing with the original algo.

Another spinning attempt to maintain monopoly by G*, that's what all this is about!

dusky

2:15 am on Aug 14, 2009 (gmt 0)

It's a weakness for any kind of machine-based "intelligence" and a sign that ANY kind of semantic processing is still in its infancy. It still takes a human mind to see "value" and not just the right variation of vocabulary... for now.

tedster, I coulden't agree more and that's what I meant on the wolfram idea for their search engine!

AG4Life

3:57 am on Aug 14, 2009 (gmt 0)

The sandbox is still showing some of the quirks that have been showing up in the main Google search engine, for example where site:forums.domain.com is showing up with more results than site:domain.com.

And there are at least one page for my site that seem to have a -950 penalties, whereas the page is perfectly fine (and have always been, at least throughout the recent major updates) on the main engine. Other than that, the SERPs are much more steady than on the main public engine and generally a bit better.

tedster

4:09 am on Aug 14, 2009 (gmt 0)

What made G* is the sheer spin and free publicity, not their relevancy or accuracy

I differ with you on that point. At the beginning, the big buzz about Google was generated virally by early adopters -- because Google's results WERE head and shoulders above the competition. There was also the obvious speed and uncluttered nature of their pages.

So I wouldn't want to rewrite that part of history, or belittle the fundamental insight that was PageRank in the beginning.

I remember all too well the pain of the old 90s engines, where 8 out of 10 search results could be completely irrelevant in a way we never see today from anyone. And the competition was very slow to respond back then, or to see the full implications. Remember, Yahoo's original repsonse was to license Google's data for Yahoo's own search engine!

They eventually smartened up, but the gap was wide open by then. Google had already grabbed mindshare and become almost a generic term for "search".

I understand having some misgivings about this giant in our lives, but they did get where they are today through a lot more than spin. And the signs that Google is still very savvy and innovative are with us right now. I'm not selling Caffeine short at all. It's a new infrastructure, and right now they barely have it in first gear.

whitenight

5:39 am on Aug 14, 2009 (gmt 0)

The thing is, all will be required to upload their data to central and regional "super servers" whether you have a site or not, and in time that will be the norm!

You said the magic words that render everything else you wrote disingenuous.

"all will be required"

I've heard this kind of nonsense couched as "fighting evil monopolies" before.

It's a Big Brother agenda that has existed since the internet was first formed.
(heck, since humans first created religions)

Sorry dusky, not gonna happen...
Whether it's Goog, Bing, the government, whoever.

Centralized superservers?!

We went off that particular time/space timeline last year and there's little probability of returning back to it. _{thanks goodness}
(but like you say, some people continue to resist this fact no matter what)

"safe and secure"?

God, really?! Does that FUD propagandist rhetoric even hold sway anymore?

Unfortunately it does with a large portion of the population' still.
But again, not strongly enough to convince the masses "all will be required..." to be "...safe and secure.."

FUD is FUD, dusky.

Whether it's coming from Goog or anyone else.

cangoou

2:10 pm on Aug 14, 2009 (gmt 0)

Hm, still offline - did we aim too close here? ;-)

srik79

3:03 pm on Aug 14, 2009 (gmt 0)

it was up for a few hours today, then gone again

signor_john

3:15 pm on Aug 14, 2009 (gmt 0)

Centralized superservers

Maybe they could be an option: Submit your Google sitemap, and you get a message that asks: "Want to superserve that?" :-)

Seriously: The Internet and the World Wide Web are what they are: a decentralized network of networks. I doubt very much whether Caffeine represents anything more than what Google says it is: a new infrastructure for Google Search.

This 201 message thread spans 7 pages: 201