|Is Google Expanding its Index?|
| 4:42 pm on Jun 11, 2010 (gmt 0)|
As discussed in other threads, on Jun 8 Google announced that the Caffeine rollout is finished. The announcement was posted at this link: [googleblog.blogspot.com ]
Here are two quotes from the announcement:
"Caffeine lets us index web pages on an enormous scale"
"it's the largest collection of web content we've offered"
But over the last few months there have been reports here that many large websites are steadily losing pages from the SERPs index.
So What is the explanation for the apparent discrepancy? A couple of possibilities have occurred to me.
1. Perhaps Google temporarily shrunk its SERPs index, but will start expanding it again, until eventually it's larger than ever.
2. What will actually expand is the underlying set of data that is fed into the algorithm, not the index itself.
Does anyone have any thoughts about this?
| 5:11 pm on Jun 11, 2010 (gmt 0)|
Something like your #2 idea, I think. The total amount of indexed information is the largest they've ever been able to accomplish - that's the "index" that includes all their raw data. The amount that gets exposed to public search is what gets limited.
| 7:37 pm on Jun 11, 2010 (gmt 0)|
I don't think they shrunk the index, I think they optimized how they store and retrieve data which impacts longtail terms more.
- This keyword has been mentioned 17 times in this 2000 word article = excellent
- This keyword has been mentioned in title, uri and meta description = excellent
- This keyword has been mentioned 1 time in total on other pages of this 5000 page site = site is not about keyword, next.
Do I have proof? no. Somewhere around the midway point of the newly released hour long 2010 IO website review youtube video Matt Cutts mentions using the keyword one more time if you're on "the hairy edge" to go back to #1 from #3. I suspect this is more of a sitewide thing, GWT counts total instances of each keyword on a sitewide level and Matt confirms there is a threshold on number of times a keyword is used.
In other words, webmasters who SEO their sites to the degree that they cover each keyword combo with a separate article are out of luck on long tail keyword phrases they mention from one article only.
Sites like WW who cover them all rather frequently have a sitewide issue of being huge which works to drop the % of longtail keyword use despite being mentioned many times. The subject is covered, yes, but the site is not about that subject (in Google's eyes) and preference is given to a site that is.
Spam sites stuff related keywords all over the place which may be why they increased in rank a little (temporarily, I'm seeing them fall off again now).
What can you do? Well, if I'm right, a huge market has just opened up for sites about more longtail keyword subjects. The fact spam sites are taking up those positions now and top quality sites don't outrank them suggests I'm right. Find the spam and create solid content on the subject.
[edited by: Sgt_Kickaxe at 7:53 pm (utc) on Jun 11, 2010]
| 7:51 pm on Jun 11, 2010 (gmt 0)|
Sounds like an 'English Version' of the phrase based spam detection patent application, which uses predictability of level and number of expected phrases and related phrases present in a document or resource on a given topic...
It's not really a 'set count' AFAIK and they use 'related terms' too, so changing from attorney to lawyer might well count as 'the same phrase' (keyword) in this setting, but phrase use and over use is definitely part of the ranking and spam detection system.
| 7:55 pm on Jun 11, 2010 (gmt 0)|
It does, but preference wasn't passed along as strongly before the change imo. Outgoing links to sites that are authority on the keyword subject are likely rolled into this too, heavily SEO'd sites don't link out often but spam sites spam links to give source credit everywhere.
| 7:58 pm on Jun 11, 2010 (gmt 0)|
Interesting... Haven't made it by the video yet, so I'll check it out one of these days when I'm not busy posting or working... LOL.
| 8:46 pm on Jun 11, 2010 (gmt 0)|
|What can you do? Well, if I'm right, a huge market has just opened up for sites about more longtail keyword subjects. The fact spam sites are taking up those positions now and top quality sites don't outrank them suggests I'm right. Find the spam and create solid content on the subject. |
Interesting thought, and possibly you're right, but I'd hold off betting the farm on it for a while. Your comments are partially consistent with a thought I posted in the June Update thread to explain some obvious spam being returned...
|...algos tend to grasp blindly when there's very little that matches the criteria they're searching for... |
The current algo is not returning sites that it thinks are weak, and in some cases is substituting what are clearly unsatisfactory matches. The context of my comment in the Update discussion, though, is whether the algo is cutting too deeply. I think this current transition is definitely a time to watch for these possible openings, but I do feel that Google will be making some adjustments before it decides that what it's filtered out should stay filtered out. Yes, some current results are really pretty bad.
It may well be that Google is going to have to adjust for niche... that many niches simply are not going to have enough sites or documents to satisfy certain kinds of queries. The worst results I'm seeing, eg, are in very specialized areas.
So, I'd wait before starting to build new sites or junk old ones, but I'd definitely start taking notes and taking steps to improve current content.
| 9:11 pm on Jun 11, 2010 (gmt 0)|
What I have noticed is that the list of keywords and their relevance, as shown in WebmasterTools, has been turned on its head in the last week or so - after being stable for months, on a site with very few content changes in recent months.