Welcome to WebmasterWorld Guest from 54.146.221.231

Message Too Old, No Replies

Is Google Expanding its Index?

     
4:42 pm on Jun 11, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:2679
votes: 94


As discussed in other threads, on Jun 8 Google announced that the Caffeine rollout is finished. The announcement was posted at this link: [googleblog.blogspot.com ]

Here are two quotes from the announcement:

"Caffeine lets us index web pages on an enormous scale"

"it's the largest collection of web content we've offered"


But over the last few months there have been reports here that many large websites are steadily losing pages from the SERPs index.

So What is the explanation for the apparent discrepancy? A couple of possibilities have occurred to me.

1. Perhaps Google temporarily shrunk its SERPs index, but will start expanding it again, until eventually it's larger than ever.

2. What will actually expand is the underlying set of data that is fed into the algorithm, not the index itself.

Does anyone have any thoughts about this?
5:11 pm on June 11, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Something like your #2 idea, I think. The total amount of indexed information is the largest they've ever been able to accomplish - that's the "index" that includes all their raw data. The amount that gets exposed to public search is what gets limited.
7:37 pm on June 11, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member

joined:Apr 14, 2010
posts:3169
votes: 0


I don't think they shrunk the index, I think they optimized how they store and retrieve data which impacts longtail terms more.

- This keyword has been mentioned 17 times in this 2000 word article = excellent
- This keyword has been mentioned in title, uri and meta description = excellent
- This keyword has been mentioned 1 time in total on other pages of this 5000 page site = site is not about keyword, next.

Do I have proof? no. Somewhere around the midway point of the newly released hour long 2010 IO website review youtube video Matt Cutts mentions using the keyword one more time if you're on "the hairy edge" to go back to #1 from #3. I suspect this is more of a sitewide thing, GWT counts total instances of each keyword on a sitewide level and Matt confirms there is a threshold on number of times a keyword is used.

In other words, webmasters who SEO their sites to the degree that they cover each keyword combo with a separate article are out of luck on long tail keyword phrases they mention from one article only.

Sites like WW who cover them all rather frequently have a sitewide issue of being huge which works to drop the % of longtail keyword use despite being mentioned many times. The subject is covered, yes, but the site is not about that subject (in Google's eyes) and preference is given to a site that is.

Spam sites stuff related keywords all over the place which may be why they increased in rank a little (temporarily, I'm seeing them fall off again now).

What can you do? Well, if I'm right, a huge market has just opened up for sites about more longtail keyword subjects. The fact spam sites are taking up those positions now and top quality sites don't outrank them suggests I'm right. Find the spam and create solid content on the subject.

[edited by: Sgt_Kickaxe at 7:53 pm (utc) on Jun 11, 2010]

7:51 pm on June 11, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 14, 2008
posts:2910
votes: 62


Sounds like an 'English Version' of the phrase based spam detection patent application, which uses predictability of level and number of expected phrases and related phrases present in a document or resource on a given topic...

It's not really a 'set count' AFAIK and they use 'related terms' too, so changing from attorney to lawyer might well count as 'the same phrase' (keyword) in this setting, but phrase use and over use is definitely part of the ranking and spam detection system.
7:55 pm on June 11, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member

joined:Apr 14, 2010
posts:3169
votes: 0


It does, but preference wasn't passed along as strongly before the change imo. Outgoing links to sites that are authority on the keyword subject are likely rolled into this too, heavily SEO'd sites don't link out often but spam sites spam links to give source credit everywhere.
7:58 pm on June 11, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 14, 2008
posts:2910
votes: 62


Interesting... Haven't made it by the video yet, so I'll check it out one of these days when I'm not busy posting or working... LOL.
8:46 pm on June 11, 2010 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:11309
votes: 163


What can you do? Well, if I'm right, a huge market has just opened up for sites about more longtail keyword subjects. The fact spam sites are taking up those positions now and top quality sites don't outrank them suggests I'm right. Find the spam and create solid content on the subject.

Interesting thought, and possibly you're right, but I'd hold off betting the farm on it for a while. Your comments are partially consistent with a thought I posted in the June Update thread to explain some obvious spam being returned...

...algos tend to grasp blindly when there's very little that matches the criteria they're searching for...

The current algo is not returning sites that it thinks are weak, and in some cases is substituting what are clearly unsatisfactory matches. The context of my comment in the Update discussion, though, is whether the algo is cutting too deeply. I think this current transition is definitely a time to watch for these possible openings, but I do feel that Google will be making some adjustments before it decides that what it's filtered out should stay filtered out. Yes, some current results are really pretty bad.

It may well be that Google is going to have to adjust for niche... that many niches simply are not going to have enough sites or documents to satisfy certain kinds of queries. The worst results I'm seeing, eg, are in very specialized areas.

So, I'd wait before starting to build new sites or junk old ones, but I'd definitely start taking notes and taking steps to improve current content.
9:11 pm on June 11, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


What I have noticed is that the list of keywords and their relevance, as shown in WebmasterTools, has been turned on its head in the last week or so - after being stable for months, on a site with very few content changes in recent months.