Google's New Caffeine Search Engine - part 2 - Google Search and SEO forum at WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google's New Caffeine Search Engine - part 2

Hissingsid

8:56 am on Sep 5, 2009 (gmt 0)

< continued from [webmasterworld.com...] >

For my top target term there are still some differences between the Caffein sandbox and .com results. It does not affect me and looking at them dispassionately the Caffein results (subjectively) serve the user better. More of my significant real world competitors appear and less of the buy their way to the top jokers.

Cheers

Sid

[edited by: tedster at 5:52 am (utc) on Sep. 7, 2009]

aristotle

1:09 pm on Sep 5, 2009 (gmt 0)

I'm still confused about what Caffeine represents. Does it simply refer to an enlargement and faster updating of the database that underlies the algo? Does it include changes in the algo itself?

tedster

7:22 pm on Sep 5, 2009 (gmt 0)

It's a new infrastructure that includes a rewrite of the underlying data storage technology - the Google File System (GFS). This will give Google more flexibility and speed in how they construct the SERPs in the future, but for now it seems they would like to make the final switchover almost seamless to end users - hence Google's request to webmasters for reports on ranking differences.

cangoou

7:25 pm on Sep 5, 2009 (gmt 0)

hence Google's request to webmasters for reports on ranking differences.

... which is a bit odd: If someone really knows all the differences, who would this be most likely?!?! ;-)

aristotle

8:18 pm on Sep 5, 2009 (gmt 0)

Well, I'm trying to understand how a change in the infrastructure and/or filing system would cause changes in the rankings. Unless perhaps it includes an expansion in the amount of data collected and analyzed for inputing into the algo, and also possibly an increase in the number of pages in the main index.

tedster

8:32 pm on Sep 5, 2009 (gmt 0)

It's the complexity effect - the GFS isn't like a mysql database. Data is sharded into various kinds of pieces and then stored across a huge server farm. When a query comes in, that data (now in smithereens) gets accessed and re-assembled into SERPs -- with all kinds of odd anomalies along the way.

Remember the last major infrastructure change - Big Daddy? That didn't even go down as deep as changing the file structure itself.

cyclinder

9:01 pm on Sep 5, 2009 (gmt 0)

i see the rollback to the 2 months old positions for my site, on google.com (same for sandbox)

its like the new algo is already active but over the old data.

my indexed pages number also decreased.

CainIV

1:10 am on Sep 6, 2009 (gmt 0)

I see lots of differences between Caffeine and DC of the day, when querying different DC's comparing various keywords for various genres.

MLHmptn

6:50 am on Sep 6, 2009 (gmt 0)

I see lots of differences between Caffeine and DC of the day, when querying different DC's comparing various keywords for various genres.

I see big differences as well. Contrary to what some here are stating there is no way the Caffeine results are live on Google.com(unless filters are just not applied on Caffeine.

kevsta

7:29 am on Sep 6, 2009 (gmt 0)

yes i've still got noticeable differences. presumably with something as big as this it'll be on or off?

ie they wont be able to blend the old and new serp to give a soft landing on arrival?

spadilla

11:51 pm on Sep 6, 2009 (gmt 0)

It seems to me as if the caffeine results are lagging behind the current SERPs. I just overhauled a website for a client who holds the #1 place for a keyphrase. He had purchased the actual keyphrase domain sometime ago and wanted to get 301 his other (#1 in SERP) domain so it would show the keyword domain instead. This was done and went live last week and I've been checking all week for the update each day this week. Just today I see the current SERPs showing the keyword domain only as #1 and in the caffeine results I am still seeing their old domain as it has shown all along up until now.

cyclinder

12:06 am on Sep 7, 2009 (gmt 0)

yes, same impression, hope this will be fixed on the actual 'release'

CainIV

12:53 am on Sep 7, 2009 (gmt 0)

It's really tough to say. I am not certain myself that there needs to be a perfect match between the two engines. Perhaps we will wake up Tuesday and Caffeine results will have propagated all SERP's. I think it is difficult to know how this will roll out.

Vimes

5:39 am on Sep 7, 2009 (gmt 0)

Yea i'd say the results are older than the current DC's i'm looking at, as an exmaple for a site i know went through a huge url restructuring a couple of months ago, caffeine is still showing the old URL structure of the site when using the site operator, SERP's seems to be matching the older data for this particular website.

Vimes.

Hissingsid

7:46 am on Sep 7, 2009 (gmt 0)

Caffeine results will have propagated all SERP's

Sorry but that just doesn't make sense to me at all. If the Caffein infrastructure is already being used as others have said in this thread then we are already seeing Caffein results now on Google.com. If it is not then the results cannot "propogate" what needs to be done is for the new software infrastructure to be installed.

I'm still not convinced that everyone contributing to this thread, including some senior members, have really got it. By it I mean what Caffein is all about. IMHO It is about how the data is stored, indexed and results extracted. It is about when and how, in the message path, the algorithm is applied.

Having said that the results on the Caffeine sandbox do have some penalties included as I've seen "penalised" sites move out and then back in but I still don't think that they have applied all of those penalties. I loosely hypothesis that they are trying to figure out at which stage in the extraction process they should apply penalties.

Just my 2c.

Sic

Shaddows

9:12 am on Sep 7, 2009 (gmt 0)

If it is not then the results cannot "propogate" what needs to be done is for the new software infrastructure to be installed.

Agreed. But iff (if and only if) Caffeinated DCs are not accessed from main SERPs due to loadbalancing, and iff redundency batch processing hasn't sourced data from GFS2/Caffeinated DC. In other words, Caffeine should not "roll out" if G has kept the data properly isolated- until each DC has had the new infrastructure installed.

IMHO It is about how the data is stored, indexed and results extracted.

Agreed. I think thats precisely what it's about, which due to arising complexity, causes diffent results.

I loosely hypothesis that they are trying to figure out at which stage in the extraction process they should apply penalties.

Strongly disagree.

The pages that are no longer suffering "penalties" should not have been penalised. Its not that penalties have not been applied, its that some pages are no longer "penalised". As such, I do not beleive it has anything to do with any intentional penalty process.

Now on to some wild theorising...

Contary to me previous post (or perhaps in addition), my working theory is that these are pages that slipped through the cracks of the previous infrastructure. Possibly in an environment that suffers from resource competition, some data must be discarded. Either you can discard data randomly, or you can select data for discard.

Lets assume G selects. This will be likely be low-level data (OBLs on PR<0.01 pages for eg), which you would expect to have negligible impact. However, the Butterfly Effect of adding in this data does have noticable impact.

As a purely speculative addendum to this theory, random "waves" of "penalties" could occur when the data loss gets scaled up, to PR<0.011 for eg.

Hissingsid

9:40 am on Sep 7, 2009 (gmt 0)

The pages that are no longer suffering "penalties" should not have been penalised.

The sites/pages I'm talking about were penalised for very understandable reasons. I liked them being penalised because my understanding of why they were helped me to plan my activities. Now I don't know if they are permanently back or if they will eventually have the penalty applied again. I'd like to know this as it would help me to plan.

As an aside I've noticed in my niche that virtually every site in the top 20 or 30 has bought links, some more than others. The current results for the most valuable terms are ranked by how good the site owner or their SEO is at buying links. If Google penalised everyone who had bought links the top of SERPS would be full of lame sites and that would have everyone running over to Bing. I wonder if Google's attitude will be forced to change in view of this.

Cheers

Sid

aristotle

12:27 pm on Sep 7, 2009 (gmt 0)

tedster wrote:
It's the complexity effect - the GFS isn't like a mysql database. Data is sharded into various kinds of pieces and then stored across a huge server farm. When a query comes in, that data (now in smithereens) gets accessed and re-assembled into SERPs -- with all kinds of odd anomalies along the way.

Thanks for the reply, tedster. So with all this "complexity" and "odd anomalies", is it possible that the Google employees themselves don't know exactly how the SERPs will be affected when Caffeine is implemented?

Shaddows

1:00 pm on Sep 7, 2009 (gmt 0)

is it possible that the Google employees themselves don't know exactly how the SERPs will be affected

Nailed on certainty, more like.

Thats why every major update has a series of aftershocks, as unforseen problems are fixed, patched and bodged.

And also why Caffeine specifically asks for feedback.

Its like saying, "don't meteorologists know the impact on weather if you rearrange the ocean currents". Thers no way to accurately predict results given current ocean currents (or SE algos), let alone what will happen if a major componant changes.

Badcol

2:12 pm on Sep 7, 2009 (gmt 0)

Certainly a huge ripple effect after every alteration. It's always a bit like lion taming with G. They get the system in a big box and then poke it with a stick until it does what it's told ;-)

I wonder if this time they have ironed out the christmas lead drought that seems to happen every year around Columbus Day and finally abates in the first week of January ?

All the best

Hissingsid

4:01 pm on Sep 7, 2009 (gmt 0)

They know that if a butterfly flaps its wings in China a member of the lepidoptera family may have fluttered somewhere in east Asia!

Its like chaos only not so well organised.

Cheers

Sid

CainIV

7:44 pm on Sep 7, 2009 (gmt 0)

If the Caffein infrastructure is already being used as others have said in this thread

Right, and of course they [we] all know about as much about the timing of this release, whether it has [or has not] been already incorporated at least in some segmentation to the current SERP's, which is why I mentioned 'perhaps'.

I, for one, cannot see how the Caffeine results currently 'match' to the current SERP's in any way more than providing another 'flavor' of what is currently there. But I do subscribe to the summer "we busted it and have the fix now" theory.

Badcol

8:00 pm on Sep 7, 2009 (gmt 0)

Hi Sid,

Good to hear your voice again ... been a long time !

However, there's probably an algo written to prevent a butterfly from flapping its wings in China these days ;-)

Col :-)

cangoou

8:27 pm on Sep 7, 2009 (gmt 0)

Year, if the butterfly flaps like it is his nature he is banned 50 feet behind... Sorry, it has to be said ;-)

Hm, nothing big changed today, so what big holiday is next we can wait for?

barretire

10:44 pm on Sep 7, 2009 (gmt 0)

The days not over yet. I am still curious to see if anything changes when I wake up in the morning.

steveb

11:54 pm on Sep 7, 2009 (gmt 0)

The results are completely different still. Google likes holidays but this is not the only one.

They only asked for feedback a little while ago. They aren't in some big hurry.

brinked

6:44 am on Sep 8, 2009 (gmt 0)

I would look for some changes to google tuesday morning. I always see major changes the day following a weekend...I always remember waking up in the morning for work and checking my serps, I would not be surprised.

Hissingsid

8:22 am on Sep 8, 2009 (gmt 0)

I agree with steveb.

If you knew how well he does in the most competitive market outside of $orno and pills you would agree with him too!

Cheers

Sid

Badcol

8:57 am on Sep 8, 2009 (gmt 0)

I'm not seeing much of a change to the results since last Thursday, but I am seeing a massive URL update in the sandbox. A site I work on is now showing a full 2600 pages on sandbox, but only 232 on regular serps.

Cheers

moftary

8:18 pm on Sep 8, 2009 (gmt 0)

It's the complexity effect - the GFS isn't like a mysql database. Data is sharded into various kinds of pieces and then stored across a huge server farm.

It's to me something like DRDB.
I see very outdated results in Caffeine, so it's like they pushed all old data back to years ago and then reapplying algorithms, filters, etc..

But of course, it's not that simple :)

This 238 message thread spans 8 pages: 238