| This 138 message thread spans 5 pages: 138 (  2 3 4 5 ) > > || |
|Signs of Fundamental Change at Google Search|
In the 950 penalty thread [webmasterworld.com], randle recently posted that "...there is a broader picture here, fundamental changes are afoot."
I agree, and I'd like to collect people's observations about what "signs of fundamental change" they've may have observed in recent months.
One big one for me is that no matter how common the search term -- it could generate billions of results -- you always seem to bang into an "omitted results" link before your reach #1,000. In fact, i just checked out a search on the word "the" which google says generates 5,300,000,000 results. And even on this monster, #928 is the "omitted results" link. Hmmmm....
Now 5,300,000,000 also seems like a low number to me - unless it does not include any Supplemental Results. So my current assumption is that by fattening up the Supplemental Index, Google has pared down the main index to somewhere in the vicinity of 5-6 billion urls.
A related sign of fundamental change, I feel, is the problems Google currently has generating understandable results for the site: operator or the Webmaster Tools reports. It looks to me like the total web data they've collected is now broken up into far-flung areas of their huge server farm -- making it very difficult to pull comprehensive site-wide information together again.
From what I have seen from a large sampling of sites, the supplemental index is growing at a significant rate.
I also get several calls each week from people I know wondering what is going on. This has been going on for the last five weeks.
Well established sites that are over eight years old are having an increasing amount of pages moved into the sup index.
From a simple user search perspective I've been seeing things that make my hair stand up.
About six months, I did a search for something quite esoteric .. the air flow in a traditional dome structure popular in the middle east. First page .. found an excellent hobbyist site explaining this thing with loads more info on architectural elements, the history, the why's and the wherefore's.
This past weekend I did the same search. The little site with excellent architectural information is nowhere to be found .. not even page 20 of the results. (I had the reference so could go directly there, and it is still there and updated...)
Instead, what I was presented with, was architects advertising, airflow companies advertising, air conditioning companies advertising, car air conditioning companies ... etc, etc, etc.
Yes, something is happening, but whatever is happening, it is not good for me from a user perspective. Webmaster stuff? We leave that up to the gods, the little people, the fairies, the sprites and all of those good luck charms .. (So, far I've been lucky!)
Something in the spirit of the internet has forever disappeared for me.
This is something I've observed. I don't quite know what to make of it.
|Yes, something is happening, but whatever is happening, it is not good for me from a user perspective. |
Google is still returning relevant results, but they aren't good results.
What I'm seeing now is not as human-friendly was what I was seeing two-three months ago, which is disturbing. "Human-friendly" in the sense that the results were really useful and really informative. The results now, are likely as not to be MFAs, while the "good" sites are buried.
I've been seeing alot of what I would call doorway pages popping up again. Maybe not at the top, but on the first page.... most of these... from my research and long observations at pages specific for the search engine, and once spidered are then set to a 301 redirect to a completely different page where you could search for things on the site.
I thought this was the type of behavior that Google didn't like.
I'm also seeing that some of top pages for this particular search have the keyword not once, but twice in the title! What's up with that? it looks really bad to the users. And the pages ... dont get me started, for this particular two word search the pages at the top have one of the words 7 times and the other word 10-20 times! SPAM!
|A related sign of fundamental change, I feel, is the problems Google currently has generating understandable results for the site: operator or the Webmaster Tools reports. It looks to me like the total web data they've collected is now broken up into far-flung areas of their huge server farm -- making it very difficult to pull comprehensive site-wide information together again. |
I'm seeing some alterations in the current behaviour of the site:tool
It's not consistant at all, depending on the entry you put in, and for us it is very worrying as it keeps displaying "supplemental".
Vanessa Fox at Google has recently said said that website owners should not be worried by this and that the tool is awaiting to be fixed.
In the light of what's been said on this post i feel fundamentally uncertain about the way we should intrepret the health of our sites and the direction that Google might be taking.
tedster wrote "you always seem to bang into an "omitted results" link before your reach #1,000."
But it make sense to deliver only 1000 results as we all know that 90% of the people will not go beyond the 20 first results. So keeping the main index fat & slow just that you can say "I have the biggest" is an idea that most of the search engine have abandoned long time ago.
A few months ago I decided to put some adsense-ads on my B&M-website for two reasons:
A) I wanted to provide some alternative ways out for my customers in case they did not find what they desired and
B) I thought google would like my website a bit more, if she had the chance to earn some money with it.
With the first of April I found a message from the "google optimization team" in my adsense account, which gave the advice to put more than one block of adds on the relevant pages in order to improve my adsense income.
If that was an April's fool, it would be the most subtile one I have ever come accross. If it was not, it gives a pretty clear hint on what google is aiming at.
It has been reported quite often, that old, highly informational sites have vanished from the serps in the past months. I'd really like to know whether these had adsense on them. Adsense is really a very ambiguous thing: On the one hand MFA-sites pollute the serps and google is trying to get rid of them. On the other hand info-sites without adsense cost a lot of bandwidth for google without bringing in any penny.
I never cared for that supplemental thing very much, because my site has been very very stable through all the tos and fros of the past months. All my omitted pages perform very well on the long-tail-searches, but I must admit that I did not fully understand what the supplemental pages really are. E.G. today I noticed a significant difference in the way google presents results for searches on
B) "mycompanyname site:mydomain.com"
with the word 'mycompanyname' naturally appearing on ALL pages. One would expect both searches to reveal the same result, but in fact it does NOT, and the difference in the number of ommitted pages is tremendous.
From a user's perspective, one negative aspect that I have seen over recent months is the disappearance of extremely-specific technical forum pages (I presume into the supplemental index).
A year ago, you could copy and paste an error message into google and it would usually return a couple of forum threads (normally with a solution or workaround). Lately I find it impossible to find solutions to technical problems through Google.
I do think that the results for commercial single word or popular two-word queries have improved.
i.e. A search for Blue Widgets seems to return the market leading blue widget retailers.
There seems to be a designed change in the searches to as some have indicated eliminate the sites using up the bandwidth without generating the cash flow...
Chasing the money has lead many a company down the tubs just ask Enron.
I have quit worrying about Google and focused more on other areas to keep our business in business.
It has taken about 2-3 months but we are on a steady rise in sales even though Google traffic continues to drop our traffic count is rising as are our sales.
I still like the post
"True webmasters can survive without Google. Webmasters that can't are not webmasters but a Google Junkie"
I guess I have kicked another addiction...
You know I can't blame Google as they are in business to make money but are they changing the serps to do what they said they didn't do.
[google.com...] I wonder if this is still their real focus......
|it make sense to deliver only 1000 results |
Quite true, asher. However now you consistently do not even get the full 1,000, but you always used to. This is "the sign of fundamental change" I was pointing to.
I don't know if I am offtopic, but would like to share my observations with you...
In the industry I watch the number of results for a given phrase (keyword1 keyword2) is decreasing steadily in the last days. A week ago it was around 3.300.000 and today the number of results was between 1.500.000 - 1.800.000. Meanwhile I am seeing a permanent fluctuation in the serps (5 pos up, 2 down) on the same datacenter several times even during daytime which is quite unusual.
There is another strange thing I have observed. I used to watch a phrase like blue cheap widget and for 3 weeks now the serps are a disaster. In the first 5 results there is 3 duplicate content site uploaded the same content to 3 different free webhosting sites and there are a lot of other results in the first 20 which are from free advertisements sites. It is creepy... It seems G is back around 6 years.
And as an extra I am also seeing the number of supplementals rising...
No clue what is happening
Things we have observed
- Our site is like 9 years old and is a pr 7 flirting with 8 "authority site", we have hundreds of thousands of links.
- Over the last 2 weeks a massive portion of our pages have been moved to the supp index, when I mean massive I mean 85% of the stuff that has been in for years. Unique hand made highly deeplinked stuff.
- I know toolbar pr means nothing but I see 3 floating sets of toolbar pr.
- Our google traffic is down bout 35%
- Much of the stuff replacing our pages makes almost no sense at all. Not even vaguely related to the subject.
We have been doing this before Google was alive and we will be here long after but after some reflection I had a good friend in the industry double check our stuff and make sure its not us, he does work for many of the webs biggest properties.
After careful review he said
"when google is messing with sites like this, nothing is sacred anymore"
That about wraps it up for us, we are moving on to other focuses and not playing this childish game anymore.
"This has been going on for the last five weeks"
Yes, not so much for supplemental results, but ranking changes are occurring for me. It’s hard to pinpoint, but I am of the belief that some, not all, but some rankings are being skewed to personalization. (i.e click thru data).
I have seen an inconsistency with regards to 2 word and 3 word phrases.
Ex. Green Widgets will rank #1, but Green Widget tool is not ranking as it used too and green widget tool is the intended phrase. I speculate that segmentation or something like that is the cause, but that is only a guess.
I am also seeing a constant change in the total results listed for any given keyword. I can only conclude this is due to increased data refreshes across multiple data centers. Anyone else see an increase like this?
I have noticed that there has been recent PR flux. A great example for me is a Wikipedia page that I created; it continually bounces between a PR1 and PR5. It has been doing this since the beginning of March.
If I was to guess what is going on from a fundamental standpoint; Personalization is starting to show its face in rankings, which is causing link data and rankings to become more sporadic.
However, I can say that I do have 3 or 4 different sets of collected data over the years and I am seeing similar ranking issues that were present this time last year. Not all of the same symptoms, but it does have some similarities.
The thing with AdSense is not so clear-cut.
I have a rather large rather technical informational site that I've had AdSense on since 2003 (some two weeks after AdSense has been rolled out). The site has been through a lot of trouble with Google in the last 15 months from just dropping in SERPs to -30 penalty since October 06. About a week or so ago the Google traffic just stopped - I've logged 6 visits from Google yesterday (3500 visits/day 15 months ago).
So, having been one of the oldest AdSense customers out there did not help a single bit and, in fact, I am planning on removing AdSense (gradually phasing it out) since most of my traffic comes from Yahoo and possibly replacing it with YPN.
And I would also second another observation given here that Google SERP referrers are rather "simplified" these days. Until the BigDaddy update I was routinely finding 5-6-7 or more word phrases with VERY technical details in them (sometimes exactly what the other poster was saying here - cut and paste entire error messages). Not anymore. In fact, the most common are referrers from two-word SERPs and, looking at my Urchin stats, I don't see ANY 5 word ones.
I don't know if it's need for more money that's messing with Google's usual ways but it does seem that their index is now so vast that they have to find ways to simplify the key phrases in order to reduce load on their infrastructure. Same goes for limiting the number of searches returned. If you think about it: if you lower the limit from 1000 to 900 you've saved time/CPU/memory on searching through at least 10% (or more - depending on how obscure the keyword is) of your entire dataset. Given their size, it could be huge.
It seems like a fundamental shift happening is now their sorting out the results that are returned for a given query, rather than just rote ranking. And if you’re going to sort things quickly and effectively, you want to start by identifying traits you can use to do that with. Perhaps their working on defining and weighting these identifiers, thus the wild fluctuations in the results you see from day to day. If you were used to being returned high on the second page, and then one day your # 30 or # 950 perhaps they identified something on your site, that told the algorithm “put that page over into that pile”
For years, for all of what I would call “decent” above board sites for the sake of argument here, the algorithm always looked to us like; (A x 4) + (B x 3) + (C x 2) + (D x 1.5) = your score, for that given query. They took all the elements on your site that could help answer the searchers question, (age, links in, links out, trust, hub, text, ect.) all the things that would tell you what that page is about and how well it might satisfy the searcher. Then ran it through the formula, and ranked you based upon how your score stacked up against all the other sites that made it into the pool for that given query. Just a clear pecking order based on your “score”.
Perhaps now we see the subtle (well the implementation sure doesn’t seem so subtle) of the minus sign into the equation as a sorting tool. Negative elements increasingly being factored, aspects of your site that gets you sorted out earlier in the process than you would like. So, whether it’s down 30 places or 940 places, something is subtracting from your score, and you end up getting sorted (ranked) into a different pile than you’re used to. There sticking with this concept, and continuously tweaking it, as evidenced by the duration of the phenomena, and the fact that sites keep radically popping back and forth. Its not a “penalty” as we think about it, but you have factors on your site they think is a “negative return factor’ and the weight of that, depending on the day, gets you sorted into a pile you don’t want to be in. Right now it looks like the formula, or action of this is pretty crude.
For years we have all focused hard on trying to ascertain what signals of quality Google is looking for. Get more of those qualities and you rank higher, your “score” is better, you jump ahead of the guy above you. Perhaps now it’s time to think about what they see as adversely affecting your ranking for that particular query; what aspect of my site gets me sorted into that pile (the omitted results pile? Why was I omitted in the first place? Why was I put in the pile in the 900’s? what caused the algorithm to think I just didn’t belong? How couldn’t my site answer the searchers question? That’s what my sites exactly about! Why is that page supplemental?, uhg the ultimate sorting pile!). I’m not talking about penalties or just outrageous stuff like wacky internal links, key word stuffing, ect., but aspects of a site that would (or should) subtlety subtract to the point that eventually through this sorting process you got sorted right out the back door.
Age, as a signal of quality, seems to be dramatically downgraded in this sorting process. This may be causing a great deal of the present confusion because it used to be such a powerful factor you got thinking your site was better than it really is. Nothing is more intoxicating than sitting at the top for a very long time to distort your reality. For a long time aged sites were like a winning lottery ticket. Get a high ranking site, with some good back links, and just put it on auto pilot; no more.
You almost get the sense their trying to get the algorithm to have the ability to really pick over every aspect, of every site and sort them out quickly. Again if you want to sort through a pile of something, you start with the most obvious trait. If it’s a pile of bolts, you instinctively start pulling out all the 3 inch ones, then on to the smaller ones, ect. If you end up in the position 950 pile, I don’t think there’s necessarily anything wrong with your site, there was just something on it that caused you to get pulled as the sorting process went on, for that particular search term.
If I want to rank high for a particular term, what traits on my site will keep me from getting left standing when the music stops?
Yesterday I decided to put just one relevant outgoing link on every page of a legal website. I did a search on each pages principal search term for topically and Google returned tons of sites but they were all my competitors and I didn’t fancy giving any of them a free ride. I switched to Yahoo and came up with a lot of academic sites interspaced with my competitors, for a search on those same terms. Now why would Google downgrade academics when they seemed to push them to the top in the past?
[edited by: JudgeJeffries at 6:00 pm (utc) on April 5, 2007]
I have posted this observation once before and I'm sure many of you disagree with its premise, but again I'll say that I believe that -- in part at least -- what we have going on here is the "curse of bureaucracy". And let's face it, any multi-billion $$ multinational company is most definitely a bureaucracy.
I don't care if it's corporate bureaucracy, or governmental, or non-profit, the same pattern can apply -- here is how I see it at Google:
 Highly trained very bright software engineers are being paid big money;
 They aren't being paid to sit on their hands and leave well enough alone, so they tinker with things that in many cases are not broke;
 Sometimes they make things better for the organization and for its constituency, so we sing their praises;
 But other times, while they have no malevolent intentions, the fact is that sometimes they unexpectedly make things worse;
 But even when things get worse for the organization as a result of their tinkering, it is not bad for them, because then they have to try to FIX what was not broken to begin with!
So, they manage to justify their high salaries by staying busy all day long, and in their little world, everything is beautiful.
Eventually, they'll tinker some more and it will all seem OK again.
Until the next time after that......
Excellent post, you may be onto something with negative rank factors. You'd think that lack of positive is a negative in itself, but since we don't know the actual formula...
Anyways, I have a hard time imagining what might be the negative factor for mysite.com on search for "mysite.com" that bumps you down to #31 (or even #225 as I have started seeing yesterday). I am pretty sure mysite.com is much more relevant than that loose bunch of local business listings, outright scraped pages, and other VERY random pages that are being returned before mysite.com.
Well, the above sort of leads me to believe there is still such thing as penalty in Google's arsenal.
I agree with randle, I think that (-) has been added to the equation and they may be playing with it. Or at the very least carrying more weight.
With regards to a penalty, I usually rule that out because in most cases i see it is a filter more than a penalty.
I just can not see thousands of web masters getting dingned with a penalty at the same time on a massive scale.Especially with the sporadic behavior of rankings.
However, I could see a filter to "on page" (or perhaps off page) elements cause hundreds or even thousands of sites to move to a different pile as randle says.
"Age, as a signal of quality, seems to be dramatically downgraded in this sorting process. Get a high ranking site, with some good back links, and just put it on auto pilot; no more."
100% agree here. This has undoubtedly happened in what I see.
|I agree with randle, I think that (-) has been added to the equation |
I sort of agree - except that I lean toward a fractional multiplier more than a minus factor.
My current idea (this is used in many IR approaches) is that a preliminary set of results is returned, but then one or more factors undergo further testing. The preliminary set of results is now re-ranked according to multipliers determined in testing just those preliminary urls. These test factors could also be pre-scored and updated on a regular (but not necessarily date co-ordinated) basis, and be available in a secondary look-up table somewhere for quick use.
If your url doesn't get into the preliminary set of urls, then this re-ranking step won't ever help you -- because no new results are "pulled in". If your url is in the preliminary set, the re-ranking may help you. But if you fail one of tests, then your relevance score, or your trust score, or your aging score, or your whatever score, can be multiplied by 0.2 or a fractional factor like that. that would send your url on a rankings nose dive.
So this type of re-ranking could account for the yo-yo behavior we see, going from page 1 to end-of-results and back again. Note that the url is not thrown out of the result set, the preliminary result set is kept intact, but just re-ranked.
Part of making re-ranking technology like this practical and scalable would be getting very quick preliminary results -- often cached preliminary results, I assume. This need for speed might also account for the large numbers of urls being sent to the Supplemental Index, making for a less unwieldy primary index.
Supplemental urls would only be tapped if the total number of preliminary results fell below some threshhold or other.
This is my current line of thinking - purely theoretical, although informed by some knowledge of the art of Information Retrieval. I keep looking at the changes and asking myself what kind of math could account for the new signs we are seeing.
As long ago as 2 years, GoogleGuy mentioned in passing that we were tending to think in terms of filters and penalties, but that Google was moving away from that model. I think they've moved a giant step further -- although some filters are clearly still there (only 2 results per domain for example) and some penalties as well (often manual).
[edited by: tedster at 7:44 pm (utc) on April 5, 2007]
I feel that your observations are correct as well, randle. One of things that I worry about is that now, with these new negative factors, when pages are dropped to 950, our site will be tagged and put in a "watch" pile. And even when we fix the items that they consider negative, we will never fully regain our status. When you've spent 7-8 years building a quality site and have this happen while trying all along to please google, you may now be put in the same pile as a spammer or scraper. It's disheartening.
the early omitted results has been like that, at least a year, also the site count I dont think they are so accurate anymore then for 1-2 years ago.
All the troubles google has lately or the changes in the search view/examples is related to saving space, maybe harddisc space - omitted results, supplemental results,
not accurate site: count, there is also A LOT less images on the google image search.
|saving space, maybe harddisc space |
I agree that space is one likely factor -- not disk space itself, which is easy to come by, but the need to process a growing huge number of urls and still give speedy results for the user's query.
I wonder to what degree the boiler plate text/duplicate copy issue is relevent here. In particular, with regard to the fact that some have mentioned forum posts being 'lost'.
Forums do tend to have a lot of repetative elements and I have also noticed this trend in finding it hard to find forum posts, with forums having more and more threads being consigned to supplemental/similar results land.
|My current idea (this is used in many IR approaches) is that a preliminary set of results is returned, but then one or more factors undergo further testing. The preliminary set of results is now re-ranked according to multipliers determined in testing just those preliminary urls. |
This would (I assume) have doubled the database processing operations for each search and - in doing so - have increased the lag between the search and the results being displayed, surely? I haven't noticed a drop in speed performance, so I can only ascertian that either Randle is closer to the mark OR (and this is a maybe) Google CAN reorder the results, but ONLY DOWNWARDS because they penalise results on the fly as they load (thus only having to retain the existing database calls). This would fit in, I guess, with penalty scores.
Every one of the -950 sites I have looked at have broken google's guidlines in one way or the other or their HTML is very bloated and does not display in all browsers properly.
People say, well this "keyword" shows Page A at -950. You look at Page A and you find nothing wrong with it. Ahh, but digg a bit deeper into Page B, C, D, E and so on and you will soon find it.
You are right on about a penalty tanking your rankings quickly. Google is just getting a lot better at hiding it so people can not game it and its harder to troubleshoot. In otherwords, violate google guidelines on page ZZZ might have a serious effect on all your pages.
If you do not believe this, try me out on this one.
By the way, if your going to sticky me a site to look at do not send me pages that have hundreds of html errors on it. You have to have clean HTML before you can properly troubleshoot any website.
[edited by: trinorthlighting at 8:44 pm (utc) on April 5, 2007]
I think there are so many factors Google uses to rank results and it's extremely difficult to isolate cause and effect anymore. I know I sure have trouble isolating cause in my ongoing experiments.
I have a site that seems to be holding its own for the most part though, fluctuating a little week to week but always ranking near the top for its targets and rising from a seasonal influence.
What I've observed is that my pages that subjects already well-covered elsewhere go into supplemental-- either that or it's that they get little traffic-- which is something Google can get at from various sources. It may be that Google makes it hard to break into a saturated information space, and it may be that they rank partly on actual page visitation, which would have coincident affects, including an objective tendancy to favor sites that buy advertising or do promotional drives to generate traffic. I think as ISP's, Yahoo and MSN look at visitation to separate the wheat from the chaff too, and I think they all measure each other on this metric, in a great feedback loop that adds consistency to their results and also makes it hard to compete with an existing top result. Change is slow, and follows a bubbling/sinking pattern.
I think also they're valuing on-topic outlinks too, either in terms of static logical structure and relationship (providing some measure of breadth and depth of an info space) and/or in mining visitor behavior like page views downstream of a serpclick (some measure of interest/quality). This may be why Alexa traffic metrics show CNET growing compared to other tech news and discussion sites, for example, (speculation) because they offer rich, on-topic cross-linkages on every page, which would have the potential to consume the attention of a knowledge seeker. Same observation with Wikipedia in relation to serps, and my own observations on my site.
Of course, serving a seeker of knowledge is the core purpose of a search engine, so measuring delivery of knowledge seems like a sensible metric.
Some excess code may be an issue but I doubt coding problems are a big part
Did a search single word validated #1 site
Failed validation, 169 errors page only has 450 lines of code so there is a 50% error in this code
I have 1000 plus site evey page is green I took the pains to do that a year ago so that may be some but I highly doubt it. I was on this page gone to no were land now
Did a search Google has a page can't be displayed ranking #2 2 million plus pages.
Google has some serious root issues that can't be explained
| This 138 message thread spans 5 pages: 138 (  2 3 4 5 ) > > |