I am seeing the same thing. We have lost pages since yesterday. Same amount in Google.com and the sandbox DC
Could these big number differences be somehow related to Google trying to put Caffeine into google.com and the other googles?
Maybe Caffeine is rolling out with Wave tommorrow?
daft question time - how are you guys telling the difference between seeing a Caffeine serps or the good old fashioned youtube / wiki serps
Not a daft question at all. There's been a lot of difference of opinion on this, especially since Caffeine is not an algorithm change, per se, but a change in the underlying infrastructure for the data.
In the original Caffeine announcement, Google's said their hope was that people would barely see a difference in the search results when the Caffeine infrastructure was finally rolled out... and they were in no hurry to do that, either.
My own opinion is that Caffeine is not yet in production, not visible from google.com. Here's what I think is happening. The legacy production system is seeing its data massaged, and some of the export/import processes between data centers are being changed. These shifts are happening on the legacy system to make the eventual changeover smoother.
Those are some great points Ted. I was hoping to ask a question.
|The legacy production system is seeing its data massaged, and some of the export/import processes between data centers are being changed. These shifts are happening on the legacy system to make the eventual changeover smoother. |
Is this what the Google Caffeine search engine is being used for? Is Google Caffeine the legacy production system? The changes being made are being done here first and then what you see there you will see in Google.com?
Caffeine is not the legacy system - it's the NEW system. Caffeine involves changes in the way the database structure itself is coded, more than changes to the actual data it holds, or the ranking algorithms that create SERPs based on that data.
What parts of the infrastructure are changing is an open question, but it seems very like that it includes a new kind of physical server, as well as changes in the GFS (Google File System).
Many webmasters I read online are thinking about Caffeine as an algorithm change and they're watching for rankings shifts -- but that's not the point of it. The goal is more speed and better efficiency for updates to the data, and not really changes to the SERPs.
Caffeine is a change to the index itself. It's a bit like the Big Daddy change - it will affect rankings, but only as a by-product of how they index and what they index.
P.S. when you do comparisons between Caffeine and the existing engine, look at the number of results returned. If your rankings have changed in the new index it's probably because they've included/excluded pages that had links to you.
[edited by: AlyssaS at 9:11 pm (utc) on Sep. 29, 2009]
Thanks for the information guys. That explains the questions that I had.
|Caffeine is a change to the index itself. |
It depends what you mean by "the index".
HissingSid - We know Google only index about 25% of the web thereabouts. And they've said they are looking at "infrastructure" and new servers for Caffeine to help them cope with more pages.
And if you check, there are significant differences in the numbers of pages returned.
eg query on [major keyword] in the old engine returns 127,000,000 results, but in Caffeine it returns 74,500,000 results.
But if you do a search for [another keyword] the old engine shows 7,920,000 results and Caffeine shows 8,720,000 results.
So they appear to be changing what is included in the index itself. Which has ranking implications as a by-product. If some pages that have links to your site are suddenly included in the index, you should benefit. If pages with links to you disappear, you should hurt. Even if they'd not changed the algorithm.
[edited by: tedster at 9:42 pm (utc) on Sep. 29, 2009]
[edit reason] no specific search terms, thanks [/edit]
|help them cope with more pages |
Where did they say that?
Assuming that you are right about the 25%. I would say that at least 90% of what they currently index is *rap and what they don't index is absolute *rap so why would they want to index more?
I think we will find that Caffeine is more about "quality" (whatever that is) and currency (recent in time) than about quantity.
I may well be wrong but I'm comfortable with the degree of inaccuracy that I'm happy with ;-)
[edited by: Hissingsid at 11:54 am (utc) on Sep. 30, 2009]
|Assuming that you are right about the 25%. I would say that at least 90% of what they currently index is *rap and what they don't index is absolute *rap so why would they want to index more? |
Why would they want to index more? Perhaps because the web is growing exponentially? At least a million new pages a day, and not all of it is crap, and not all of it is on existing subjects, completely fresh content is being put online all the time.
So even if they just index 25% of it, if the web grows by factor of 10 in the next decade, the amount they index will grow by factor of ten too. And they need to be able to cope, and clearly they feel the existing infrastructure won't be able to.
That's a different matter altogether, you were suggesting that Google wants to index a larger percentage of what is out there and that is why they are making the infrastructure change.
I don't think that this is about indexing a larger percentage I think its about indexing better in a more timely manner. Indexing more % will not produce better results or more income and Google knows it.
From last 60 days, i am ranking in first page for a keyword and now its suddenly gone through to second page. My website is four years old.
Regularly working on off page factors and Well Optimize in On page
[edited by: tedster at 1:25 pm (utc) on Sep. 30, 2009]
[edit reason] moved from another location [/edit]
I had one of our main terms come back from oblivion on Monday. It had been gone for about 6 months, and then 6 months before that. It's back at it's number one spot now.
Uh, just to throw a little semantic confusion and clarification into the mix, since the word 'index' is being thrown around very loosely:
Index, as referred to by Google, is NOT the pages (URLs) stored in the database (GFS). Index, as referred to by G is what they show the end user when they conduct a search... A 'noindex' tag does not prevent your page from being spidered and stored. It prevents it from being shown in the results.
Just a little semantic FYI this AM.
I just decided to throw it in here, because of this post:
|It depends what you mean by "the index". |
Because, Sid's right, most of the time when people refer to 'the index' they have no clue 'the index' to G is what you see, the results, not the entirety of the GFS. Ultimately I think the proper way to describe this change is they want to be able to spider, parse and store more data faster, so they can ultimately have a more inclusive index, which is what they show their visitors.
I'm feelin' a bit 'posty' so I'll clear something else up:
A robots.txt exclusion says:
Do Not Visit This Page.
Do not store, spider or save it.
A 'noindex' tag says:
Spider this page all you want.
Store and use it for all the calculations you feel like.
Do Not Show It To Your Visitors.
They are not the same & are really not even effectively the same thing, which I believe is a common misconception among webmasters.
< continued here: [webmasterworld.com...] >
[edited by: tedster at 4:04 am (utc) on Oct. 1, 2009]
| This 77 message thread spans 3 pages: < < 77 ( 1 2  ) |