| This 138 message thread spans 5 pages: < < 138 ( 1 2 3 4  ) || |
|Signs of Fundamental Change at Google Search|
In the 950 penalty thread [webmasterworld.com], randle recently posted that "...there is a broader picture here, fundamental changes are afoot."
I agree, and I'd like to collect people's observations about what "signs of fundamental change" they've may have observed in recent months.
One big one for me is that no matter how common the search term -- it could generate billions of results -- you always seem to bang into an "omitted results" link before your reach #1,000. In fact, i just checked out a search on the word "the" which google says generates 5,300,000,000 results. And even on this monster, #928 is the "omitted results" link. Hmmmm....
Now 5,300,000,000 also seems like a low number to me - unless it does not include any Supplemental Results. So my current assumption is that by fattening up the Supplemental Index, Google has pared down the main index to somewhere in the vicinity of 5-6 billion urls.
A related sign of fundamental change, I feel, is the problems Google currently has generating understandable results for the site: operator or the Webmaster Tools reports. It looks to me like the total web data they've collected is now broken up into far-flung areas of their huge server farm -- making it very difficult to pull comprehensive site-wide information together again.
Read that Google recently released its corpus of words for the public.
Don't know if this adds to the relevancy of this post. It seems to me if I had lots of computational power and lot of data I would use context measurement methods in determining the relevancy of inbound links.
If link said "link to red widgets" - I would check, is this a text, is it a text about red widgets and if not how close did it come to texts about red widgets or similar subjects. This is just ordinary text recognition stuff which is easily stored and indexed.
The conclusion I guess is unless you have spam or bad texts on your web sites inbound links variation or non variation shouldn't make a difference.
Of course everything is relative if a lot of text about red widgets is spam than the then measurer might start to believe this is how texts about red widgets look like, but then again so might you if you hadn't been on the other side of the wall.
|And here's a wild conjecture: That could have a knock-on effect, where pages that had a lot of backlinks from supplementals lose their backlinks, and therefore themselves become supplemental- causing pages that they link to to lose links and so on. If that were true, you would see massive upheaval in some parts, and in other parts blue skies, a light breeze, and plain sailing. |
That's not wild conjecture Callivert; what you describe is exactly what I'm seeing. Months and months of slow boiling. Google has been cooking the "unimportant" pages out of their main index into the supplemental index.
This is proving very effective in taking out entire categories and neighborhoods of links Google doesn't want influencing their results: huge scraper sites, forum and blog comment spam, reciprocal links, crappy directories, etc. Most all have been boiled off supplemental. The result is a fundamental shift in Google's distribution of link popularity across the web.
Unfortunately this has also sent many important pages for obscure queries supplemental. Users can no longer count on Google consistently returning good results for obscure queries.
A friend of mine was extremely frustrated when she couldn't find a page a second time for a query: "STATENAME fingerprint card supply hours" It gave her the exact page with the hours of operation as the number one result the first time she tried. Magic! Two weeks later she needed another card after her prints for her bar application were smudged. The page was nowhere to be found and she was very frustrated. I did a bit of investigation and the result she was looking for was on page 3 and had gone supplemental. Yahoo and MSN couldn't find it either, but it was a step back for Google.
I ABSOLUTELY agree with this theory! In fact I have described it previously in threads.
I referrred to it as the "Tsunami effect". This would explain why month after month "new sites" are 'hit' with the same symptoms as the ones that were 'hit' in x (pick any month)serp changes...
This is spreading like a huge wave. Next month - another batch of 'new sites' will be signing on expressing the same list of wows - as did the sites last Sept, Oct, Nov, etc.
THE GOOD NEWS IS... IF this is truely the cause of sites going MIA - then a good solid link campaign should help bring your site back to some stability in the serps.
Here is my post on New Years Eve...
|Collateral Damage: just a thought... |
I have read nothing here that is different from posts from those struck in previous months!
As we all know, no site (if it is indexed in Google) is an island. We are all dependent on links from other sites. Good solid linking plus content insures our placement in the serps.
THINK ABOUT IT...
If my site gets hit this month and my pages go supplimental, what effect do you think it will have on the sites that I have linked out to, next month?
What if when I come out of "supplimental hell" my site does not really return to the primary index? By this I mean I do not regain my previous placement in the serps? Will the links out to other sites maintain their original strength? I don't think so. Sites who I link out to will not really know that my links to them have lost potency.
It seems that throughout 2006 (since BD started) there has been a domino effect happening. I am starting to think that at least some of the problems are symptoms resulting from the weakening of the linking structure.
The main problem with Google is - it is difficult to track those who link to you AND to know the realtime PR value of those links.
My point is that no matter how old or 'trusted' your site is - it is still dependent the sites linking to it and if and when those sites have 'problems' it can't help but effect those they link out to.
I think Big Daddy has caused a virtual 'tsunami' that will eventually effect, in one way or another, virtually every site Google has indexed.
Just a thought...
Happy New Year!
Hmm, you're basically saying the right thing -- but for the wrong reason ;-).
Yes, links from supplemental pages do not carry as much weight as links from non-supplemental pages. But - big but - the reason is not that the pages are supplemental, it's that the pages are supplemental because they are naturally lower PR pages.
(At least that's according to Matt Cutt's explanation of the supplemental index.)
|Yes, links from supplemental pages do not carry as much weight as links from non-supplemental pages. But - big but - the reason is not that the pages are supplemental, it's that the pages are supplemental because they are naturally lower PR pages. |
jimbeetle, what if Google decided that supplemental pages no longer pass any link weight, rather than passing the little PR they do have? That's the fundamental change I believe happened.
I think Google installed a new knob as part of BD. That knob decides what percentage of link weight gets passed by supplemental pages. If that coefficient isn't currently zero it's darn near it from what I see... all wild speculation, of course.
This has caused Carl's Tsunami for months now. The domino effect has resulted in a fundamental re-distribution of link popularity across the web. Call it the slow boiling, domino effect tsunami update. That will stick :-)
I haven't seen any evidence that Google is discounting links from supplemental pages. Heck, I've even kick started sites with links from supplemental pages and PR from those links has been passed.
Keeping in mind that the majority of web pages carry (relatively) little PR, and many of those pages might at varying times be just above or just below the supplemental threshold, if G did discount those links that would mean links from a very vast swath of the web would be discounted. I think that would play havoc with PR calculations and we would have seen many more drastic symptoms in the SERPs than we have.
Any thoughts on the post from Randle about :
"Age, as a signal of quality, seems to be dramatically downgraded in this sorting process. This may be causing a great deal of the present confusion because it used to be such a powerful factor you got thinking your site was better than it really is. Nothing is more intoxicating than sitting at the top for a very long time to distort your reality. For a long time aged sites were like a winning lottery ticket. Get a high ranking site, with some good back links, and just put it on auto pilot; no more."
I feel like this is shaking up things a good little bit, especially in competitive markets. Can anyone else comment on this?
|Heck, I've even kick started sites with links from supplemental pages and PR from those links has been passed. |
Great data point. Maybe the knob isn't all the way down to zero, but I think it's close based on my observations. I'm seeing a very binary effect in a page being supplemental or not, beyond just it's ranking. While it's hard to point to anything but circumstantial evidence, Google has both means and motive. Discounting the link weight from billions of supplemental pages could solve a lot of difficult problems for Google.
|if G did discount those links that would mean links from a very vast swath of the web would be discounted. I think that would play havoc with PR calculations and we would have seen many more drastic symptoms in the SERPs than we have. |
But what if G discounted 1% of the pages on the web each week for a year. A year later (today) links from half the pages on the web wouldn't count. Would anybody notice the slowly increasing water temperature over the course of the year? There have been pretty drastic changes in the serps over the last year, just a little at a time. Seems there are plenty of well cooked frogs around WebmasterWorld these days.
No matter the root cause, the solution for webmasters is the same: more good links.
[edited by: rekitty at 6:43 pm (utc) on April 12, 2007]
|"Age, as a signal of quality, seems to be dramatically downgraded in this sorting process. |
What logic would lead Google to do this? What old sites if any has it determined were bogus or tremendously overrated?
I think known, solid factors like age would be figured algorithmically at the back end. Would seem to be an unnecessary burden to do that at the point of search.
I'm with Tedster that this is a reranking based on some (for want of a better word) "transient" factor that can only be determined at the time of the search, something similar to the on-the-fly reranking Ask rolled out back in 2002 (2003?) that looked at linking patterns among candidate results.
The problem here is determining the factor or factors involved. Query type? Community clustering? A semantic reranking based on query type? LocalRank? Any other buzzwords?
"What logic would lead Google to do this? What old sites if any has it determined were bogus or tremendously overrated?"
Perhaps so the same site doesn't sit at the top of the SERP for all of eternity?
I agree Marcia.
Personally, I think if more people worried less about "inbound link text" and "inbound link count", and put some time into the quality of their website (readability, usability, navigation), and the depth (breadth) of information they present they would be better off.
I'm always skeptical when people lose top rankings (even after years) when Google updates, because, during one of the updates I had the opportunity to have a look at one of the "best" websites which ranked #1 for years, then got dropped, even though there was nothing wrong with it. (The reason I'm skeptical is I thought the 2px black on black <h2> tag with 22 (+/-) linked keywords at the bottom of the home page made it a little "spammy". (I know "right on the edge of 'black hat'" is not always the case for sites losing rankings, but sometimes… ))
Signs of fundamental change at Google:
Some of the people who think SEO is "hiding text" don't rank anymore?
(I know some still do.)
Also, when I read the first 300 posts of the "950 Penalty" thread the other day there were only a couple of ideas for the drop I saw posted (up to Thread 2, Page 10) which were not 'fairly well' to 'soundly' refuted:
|From what I can see "scraped and infringed on" can sometimes make nasty trouble, but it does not generate a 950 penalty. We need to locate the real "tell" here. I am currently attracted to the overuse of keywords in on-site anchor text, but I've got nothing definitive to share. |
Here's what I keep coming back to: Why would the "widget" search get such a serious and heavy-duty penalty, but not "widgets"? And by what mechanism could that happen?
|How are the singular and plural used in phrases, and what are the possible commom phrases that would use one one other? How are the concepts that relate to the two different? |
--how to fix a broken widget, could be
--how to fix broken widgets - as compared to
--buy widgets online
Two different concepts, but even though there can be an overlap between singular and plural usage of the words, the words that accompany the word in phrases draw upon different concepts - and would be approached differently.
>>And by what mechanism could that happen?
Does query time filtering by using a "pre-processed" list make any sense at all?
I thought I would try to illustrate Marcia previous example a little differently, because I think the change is a fairly important concept.
(Sorry if this is a repeat, other than an "expansion" on Marcia's.)
(I used oil rather than widget because I think oil illustrates better.)
"Oil" = "what oil is" = informational
A. definition of oil
B. types of oils
C. location of oil
D. how to oil
(Oil is, To get oil, Where to find oil)
"Oils" = "to buy oil" = sales
A. oil prices
B. oil manufacturers
D. oil retailers
E. oil store locations
(Price of oil 0-9.(dot); Oil store hours 0-9 am pm; Oil store,manufacturer,retailer + location, address, Phone Numbers;
Brands of oil(s);)
"Oiling" = "to know how to oil" = informational
A. how to oil
B. the process of oiling
C. oiling methods
E. oiling locations
(Prior to oiling, When oiling, During oiling, While oiling)
Just as an example of how a change in rankings is possible using phrases:
"How to oil" could be 4th in importance for "oil", but 1st in importance for "oiling", and not important at all for "oils".
"Oil store locations" could be 4th in importance for "oils", but not at all important for "oiling"; "Oiling locations" might be important instead.
In a boolean system "oil"="oil".
("Oil", probably actually equaled some form of "oil" 'stemmed'.)
In a phrase based system "oil" (might) = "definition of oil", "types of oils", "location of oil", "how to oil", "Oil is", "To get oil", "Where to find oil".
("Oils", and "Oiling" could each have their own set of "unique phrases" or "more important phrases" associated with them.)
By associating phrases you can (for example) determine "what oil is" from "oil", and "to buy oil" from "oils". Two totally different sets of "result phrases" with the change of a single letter.
I'm not stating the above is an exact example, or even how the system is actually implemented, but the ability to associate phrases is the difference between a boolean system and a phrased based system. I think it also shows how rankings could dramatically change for widget vs. widgets in a phrase based system.
(Obviously some sort of "click data" is probably required to make the associations initially. It's entirely possible I'm missing something, but it seems there could be a large difference in "phrase=phrase" association compared to "word=word" association to determine rankings for similar terms.)
BTW Marcia, you're one of the first people whose posts I read when I'm skimming through a thread trying to 'catch up', because your are always informative and share some great information. Thanks! (It's how I knew I started with your posts the other day. :) )
Edited for clarity, readability, understandability.
[edited by: jd01 at 8:31 pm (utc) on April 12, 2007]
Good stuff, jd01. Query type is definitely one of the factors I believe is involved, though from the small handfull of -950 results I've been able to suss out it appears that there might be other things at work at the same time.
I think it's interesting, since once you get past the original "definition" of each, you can determine related phrases ("non-keyword phrases") associated with each. IOW "addresses and address format" might be more important than "keyword count" for "oils".
EG You could make associations where "Oils" (might) = some of the above AND "Brand 1", "Brand 2", "Brand 3", "Type 1", "Type 2", "Type 3", "Price" because those are the phrases / types of phrases you would expect to find on a sales site.
(Edited related phrases to "Price".)
The phrase-based indexing patents [webmasterworld.com] are not light reading, and they certainly don't submit to simple "do this, don't do that" rules. Nevertheless, the appearance of 5 patents on the topic in the past year or so certainly piques my curiosity.
I noticed that these patents talk about the infrastructure used to carry out these steps - and about installing the logic they represent in firmware or even in hardware. So aren't we seeing signs of some infrastructure changes? The "breaking" of the site: operator, the limiting of various search results to under 1,000 with the earlier appearance of "omitted results" links, and possibly even the growth of the Supplemental Index?
I'll throw one more out there.
When I read "clustering", "historical data", "phrase based", "run time", "individualized", "pre-filtered", "document age", "click through data", "links and associated data", etc., and wonder what they all mean, I come up with something similar to the following:
(In keeping with the same example from the above. & At the risk of using too many "specific words".)
"oil" (might) = "drill", "platform", "well", "pipeline", "refine", "crude", "ocean", "under ground".
"oil" (might) = "squeeze bottle", "hinge", "door", "squeak", "lightly coat", "manual", "instructions".
By clustering (grouping, pre-filtering) and using historical click through data, relative document age, and phrases (on page and pointing to), an individualized determination on the document which is most relevant to a "specific searcher's" (browser's) query can be accomplished at runtime?
(Don't know, but it seems possible.)
Never mind factoring in PageRank, TrustRank, etc.
Just based on click data, phrases, groups and relative age it's possible to make determinations.
If, through historical data, it is known queries for "hammer", "nails", "door" were shortly succeeded by an "oil" (or "oils" ) query, and queries for "oil" ended when the document containing the second set of phrases was clicked, it is reasonable to determine the "second" result group for "oil" should be returned for the query "oil" after the query "hammer" OR "nails" OR "door".
Document age could certainly effect "filtering" of the results, because the process of applying oil to a hinge might not have changed in the last 50 years. Meaning, an older ("more stale") document might contain the correct answer.
(Unless a lager number of older ("more stale") documents were updated to include a new phrase ("spray"), which would indicate a document not updated to include the new phrase ("spray") might have become too old ("too stale") when not updated in line with other "similar documents", meaning a newer ("less stale"), but probably not brand new ("most fresh") document should be selected, unless, of course, the "group of stale document updates" had just occurred, then newest ("most fresh") might be correct.)
The process surrounding extracting oil from the ground might have been in a state of change during the same time period, with new techniques, locations, equipment, etc. being developed, so a "newer" ("more fresh", "most often updated") document might be more important for the "first" group of "oil" phrases, which could be returned for an "oil" query after "offshore drilling" OR "tanker" OR "Alaska Pipeline".
(The above is an example only. I am not stating the actual implementation is anything like what I describe. I tried to make the example(s) more "illustrative" than "exact", because when thinking about the possibilities of groups of phrases and the ability to use historical data, there are quite a few factors which could be involved. When I think about Google, I usually wonder, "How far can you take the concept?", and guess the people with the Ph.D.'s are already working on a way to move a little past how far I think you could go. Somehow I'm sure they didn't file five phrase based patents to "break" their search engine.)
Edited: Minor Adjustments
Justin, about four lightbulbs went off inside my brain when reading your posts. Thank you! :)
Thanks jimbeetle and buckworks.
You too, tedster. I keep thinking I killed a good thread though.
| This 138 message thread spans 5 pages: < < 138 ( 1 2 3 4  ) |