| This 260 message thread spans 9 pages: < < 260 ( 1 2 3 4 5  7 8 9 ) > > || |
|Google's Florida Update - a fresh look|
We've been around the houses - why not technical difficulties?
For the past four or five weeks, some of the greatest (and leastest) Internet minds (I include myself in the latter) have been trying to figure out what has been going on with Google.
We have collectively lurched between one conspiracy theory and another - got ourseleves in to a few disagreements - but essentially found ourselves nowhere!
Theories have involved Adwords (does anyone remember the 'dictionary' concept - now past history.)
A commercial filter, an OOP filter, a problem caused by mistaken duplicate content, theories based on the contents of the Directory (which is a mess), doorway pages (my fault mainly!) etc. etc.
Leading to the absurd concept that you might be forced to de-optimise, in order to optimise.
Which is a form of optimisation in itself.
But early on, someone posted a reference to Occam and his razor.
Perhaps - and this might sound too simple! - Google is experiencing difficulties.
Consider this, if Google is experiencing technical difficulties regarding the sheer number of pages to be indexed, then the affected pages will be the ones with many SERPs to sort. And the pages with many SERPs to sort are likely to be commercial ones - because there is so much competition.
So the proposal is this:
There is no commercial filter, there is no Adwords filter -Google is experiencing technical difficulties in a new algo due to the sheer number of pages to be considered in certain areas. On page factors havbe suffered, and the result is Florida.
You are all welcome to shoot me down in flames - but at least it is a simple solution.
Good post, Hissingsid.
Where the algo is often going wrong is when it breaks up the search term and looks for other terms related to individual words within the search term.
Add in the discounting of anchor text in internal links, and it explains much of what I have seen.
I think the Adwords Keyword Suggestion Tool gives a good insight into what terms Google thinks are related, but you need to look at the suggestions for each individual word in the search term, not just the suggestions for the whole search term.
Looking at the search results is a bit like listening to a small precocious child using language it doesn't quite understand!
What your seeing with your two sites may not be a filter. It can be explained with the CIRCA technology quite easily. As the white paper explains, the ontology originally consisted of a set amount of data. Obviously, that ontology has grown, but it is still finite.
Also, Google's Adsense product was originally developed by Applied Semantics. As we know, adsense is an advertising model, so obviously the data in their ontology is going to be dramatically skewed towards commercial terms as you wouldn't base your advertising model on non-commercial terms. This is likely why your commercial site was hit and your astrophysics site was not. I would imagine their isn't much advertising done in that field (doing a search for "astrophysics" on Google shows only two adwords currently).
It is very likely that, over time, the ontology will continue to grow and the algo will be applied across the board, as the white paper seems to indicate it is fairly self-learning when fed enough data, but for now, it seems it's only being applied to a those terms it understands.
As to Sids conclusions, those were mine as well. In fact there is a statement in the white paper, that seems to concur with the thought that too much of the same phrase may be bad...
|The notion of focus is roughly analogous to the specificity of a concept, in that more specific concepts tend to be strongly related to a small set of things, and is somewhat inversely proportional to frequency, since more frequent concepts are less useful for discriminating particular contexts. |
The above statement seems to say exactly that. Repetition of a single phrase over and over means less to CIRCA than that phrase plus other related phrases and tokens dealing with the same concept being used.
This would explain why highly targeted or optimized pages seem to not be ranking as well as more general pages. It may also explain why directory pages are ranking so well, as the link descriptions would generally contain many different tokens & terms dealing with that general topic.
|there is still something that resembles a filter in place |
Wasn't it already suggested here that AppliedSemantics was originally designing this sytem for use in a commercial market, and thus had already created a list of competitive tokens and terms?
If that is the case, the infant version of this new algo, with the applied semantics in place, would logically draw from that original list?
Also, as I am trying to absorb all the information this topic is bringing to light, is the following close?
1. If you have a 3 keyword phrase commonly used to find your site or product, each word is now being looked at for individual meaning?
2. Each word in the keyword phrase would now require additional words in the page that can help the algo/filter/semantics determine the implied meaning of each word?
For instance, suppose you are in the real estate industry and own a rental property. If one of your keywords in a phrase is "property", the word "property" could actually imply "real estate property", "personal property", "possessions", "ownership", "intellectual property", etc...
The token "property" has multiple meanings. Therefore, your page should contain multiple alternative meanings of the token. Such definitions could be Apartments, real estate, rentals, homes, rent, etc...
Also, your site would have to be listed on sites specific to real estate or rental properties or have some of the additional words that imply the meaning of your site? (nothing new here as relevent links were always important.)
Ive been interrupted like 6 times while trying to write this, so I have lost my train of thought...Did anybody follow it and does it make any sense?
|I think the Adwords Keyword Suggestion Tool gives a good insight into what terms Google thinks are related, but you need to look at the suggestions for each individual word in the search term, not just the suggestions for the whole search term. |
I've been and had a play and it is positively frightening.
When I search for suggestions for widget "English US" I get a completely different set of specific terms to widget "English UK" not just a bit different, no common terms at all, none ziltch, zip, nadda. See I've even started to write in semantic theory.
When I put in my second word the lists provided for US and UK are very similar.
When I search for the two word term I get only it back from US English and 10 variants back from UK. In all cases here I'm talking about the left hand specific side.
All of this fits with what I've been bleating about her for a few days now. The CIRCA technology is optimised for US English not UK English which in our case creates another layer of complexity.
I'm still returning to the thought that analysing the pages on top and using what they are doing as a template is a better way of understanding how CIRCA is working in our unusual context.
|1. If you have a 3 keyword phrase commonly used to find your site or product, each word is now being looked at for individual meaning? |
2. Each word in the keyword phrase would now require additional words in the page that can help the algo/filter/semantics determine the implied meaning of each word?
In my own case my target three word terms currently result in pre Florida SERPs. So descriptive brandname operative returns my index page at #1 and a secondary site index at #5. In actual fact as I keep saying what is a brandname in the US is something different and a generic in the UK but even so descriptive brandname operative does not lead the search to be put through the new algo.
I suspect that in order for search speed to be maintained that the only real filter is an input filter into the system. What I mean by this is that it looks for an exact match in its "unique terms" list if there is a match then it already has access to a complete map of possible meanings, contexts etc and probably does as you have suggested disects the phrase into tokens which extends the map. If there is no exact match then whatever the phrase is it just gets put through the old algo.
The reason that I say this so confidently is because widget financialterms ie the plural form of the second word is not put through the new algo but the singular term is. I'm fairly sure that this is because the plural form is fairly rare even in the UK and I guess unused in the US.
I wonder if CIRCA might apply to my situation. I have a travel destination site. Searchers almost always precede the topic for which they are seeking information with the name of the city, which consists of 3 words, for example, "great city beach." They follow this term by a keyword such as hotels or motels or condos, but rarely all three. Before Florida, a search using the city name followed by ONE additional keyword such as hotel would result in my index page appearing in the top 5 results. After Florida my index page does not appear at all in the SERPS for this search term. Strangely enough,if a search is conducted using the city name followed by THREE keywords, i.e., hotels, motels, condos, my index page appears in the top 5 as before Florida. The problem is hardly no one uses the 3-word city name followed by these 3 topic keywords in a search. These 6 keywords are used in some of the external incoming links in the anchor text and in the title and header, but not in the body text. I wonder if I should remove the words hotels, motels, and condos from my title and header or from the anchor text leaving just the city name. This seems to be removing the specificity of the site.
Sid, the country specific "UK pages only" results also make it easier to see what is going on. There are fewer matching results and the "misunderstandings"/mismatches are easier to spot. And yes, plenty of scope for problems with the two different languages!
In the Keyword Suggestions Tool, I think the left column of more specific terms is useful to see commonly used specific terms which will divert the search off down the wrong track, and the right hand column of broad matches and additional keywords helps to show how well Google understands the language that should also be present with the search term.
Thanks for the reply Sid.
|If there is no exact match then whatever the phrase is it just gets put through the old algo. |
Are you referring to the pre-florida algo?
|Are you referring to the pre-florida algo? |
Your guess is as good as mine but that is what it looks like to me. Of course it may well be a tweaked version.
Well, I'd like to throw something into the mix regarding this applied semantics and the analysis of terms.
Travel industry site
Search for "city all inclusive hotels"; site not found in top 1000.
However, a search for "city inclusive hotels"; site returns to pre-Florida top positioning.
I happen to know that the term "all inclusive" is considered a highly competitive term, as well as the city and the hotels term.
Applied semantics should have an effect on the word "inclusive" if it was analyzing individual terms. Instead, it only seems to effect the complete keyphrase of "all inclusive". I still think there is a dictionary of terms that the new algo is being applied to but how that works in with the actual algorithm is still a puzzle.
|I wonder if I should remove the words hotels, motels, and condos from my title and header or from the anchor text leaving just the city name. This seems to be removing the specificity of the site. |
I think that there are three options for us at the moment.
1. Wait and see if there is a new update and if that does return things to the SERPs we had got used to.
2. Go through a process of trial and error (which could take months) doing stuff like you are suggesting and risking rank in other major engines.
3. Analyse what the sites in the current top 3 (or so) are doing in some depth. Put a semantics hat on and try to see the semantics code looking for context, strongly linked terms and concepts, broadly linked terms and concepts on the page on pages that link to that page and on pages that that page links to.
What I think that your site needs to look like is a little gem in a beautifull setting rather than a big fat uncut stone with big signs saying take alook at this big fat uncut stone and links from sites saying big fat uncut stone this way.
When you understand exactly what the top sites are doing then make a plan to test changing things to go a step better. This way you will do more trial and less error.
I don't think that there is any need to lose what you have in other SEs you can, with care give Google what it wants and still give Ink, and Allthe what they want. And to be fair to Google your (our, my) site may well be better for it.
I concur with the dictionary of terms theory, it is evident in all my search analysis.
It seems like certain phrases are marked and subjected to a filter of sorts. Just how that filter works and what those phrases are is still a mystery to me too but I'm working on a few concepts here that are proving to be quite interesting.
I'd appreciate any feedback you have regarding this theory.
Sunset Jim, my guess is that you should forget about your title and add plenty of text to your page about hotels in city, places to stay in city, city weather, city news, cheap hotel deals, etc. Maybe get someone else to write it who doesn't know anything about search terms, so it reads naturally.
Also do an allinanchor search for the text used in your external incoming links (if it is unique to the external links) to make sure Google is using those links for your page.
Oh yes, make sure you have PR7 or so, as well!
BUT I have been suffering from analysis paralysis, and haven't done much to my own pages yet, so, actually, I would ask someone else for advice!
Oops - Sid got there first with some good advice
|Applied semantics should have an effect on the word "inclusive" if it was analyzing individual terms. Instead, it only seems to effect the complete keyphrase of "all inclusive". I still think there is a dictionary of terms that the new algo is being applied to but how that works in with the actual algorithm is still a puzzle. |
You need to snag a copy of the Applied Semantics (Company name) CIRCA (meaning around, approximating to) white paper. If you Google CIRCA white paper and click on view as HTML you can still read all about the technology which they now wish Brett hadn't told us about.
The dictionary of terms is so big they have to use a new term for it "the Ontology". Heres a dictionary definition of An Ontology "an explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them."
So you are right and the paper gives you some insight into how it is applied if you do a bit of reading between the lines.
I can see a day when a copy of the White Paper will be on every Hotline server in the World.
|add plenty of text to your page about hotels in city, places to stay in city, city weather, city news, cheap hotel deals, etc. Maybe get someone else to write it who doesn't know anything about search terms, so it reads naturally. |
I think that is exactly right. Something on the page has to imply and support what your site is about.
Circa referred to a stove. A stove is often accompanied by a refrigerator, oven mit, pots and pans, and kitchen for example.
If you sell widgets, then you would want to make sure you explain what a widget is, what it is used for, how it works, who it is used by, etc...
Your targetted keywords need a supporting cast...
It could be people are reading to much into google's recent changes. Google has been a relitively simplistic SE for years. I see nothing to suggest a change in workings, so they have targeted some competitive keywords, there track record shows it probably some half baked simple filter. It mostly bluff, thats why they keep asking us (the whole spectrum, seo's to web user's) what we think. They dont know if its worked well, therefore they need to know what we think as much as we want to know what they are thinking. ONLY DIFFERENCE WE OFTEN TELL THEM.
I prefer the least complicated posts - either a muck-up; or a 'flavour' of Applied Semantics; or an algo that has gone awry. It could be a commercial filter, but unlikely: - it's too unethical. More likely a flavour of Applied Semantics applied to terms (tokens - whatever!) that Google has a lot of information on: i.e. commercial terms, US english only!
I intend to sit back and wait for Google to realise it has been a magnificent experiment, but has mucked up their algo to such a degree that they need to go back to the drawing board.
p.s. I remember Kung Fu too - when I questioned whether Grasshopper was a term of indearment, there was a degree of irony in this. I never expected anyone to admit 'It's like this: I am the master, and you are the student'
Just a little update for everyone. A number of my sites were hit hard by Florida, not all, but some. On one of the sites, using the old -nonsense test, it would have been number 1 for some VERY competitive phrases...but now it's non-existent.
I have been running test after test on different domains lately (including that one). I reduced the density way down to about 2 instances with no results, as well as a few other tests also producing no results. Two days ago, I altered the index page of the site with more of a semantic algo in mind. Instead of focusing like a laser on the two and three keyword phrase I wanted, I used synonyms, and other supporting terms/phrases/words for the industry.
Today, I suddenly found the index page sitting around 300 with a fresh date of yesterday. Granted, that's nowhere near where it used to be, but it's much better than the >1000 it was at before.
I've got a few other ideas along the same lines I'm going to test out over the next few days as well.
When Google was still giving the old results by disabling the filter, I printed out copies of pre-Florida and post-Florida SERPs for all of my primary keywords and phrases.
I think it was Sid who said study what those who remained have in their site to find a clue as to what this may be about. That's exactly what I've been doing now since Florida first happened. A couple of observations:
1) 80-90% of the top rankers disappeared.
2) Taking the newly open top spots were directories or large sites on theme but not specific to the keyword phrase.
Example: "city hotels" yield a site that has only a couple of pages, or even one, that relate to the actual phrase but have many pages that relate to hotels.
3) Of those that remained that were "relevent", and I use that term loosely and based upon pre-Florida, they generally fall into two categories:
a) A site with a lot of relevent content on many pages.
b) A small site with few pages but is either a subdomain or shares an IP with other domains from the same owner that are all linked together. Example: has many domains or subdomains for each city that are all linked together in a specific way and all on the same IP or are subdomains.
4) There is definately a dictionary in play.
5) Although I at first suspected reciprocal linking to be penalizing sites, I have put that to the side mostly; however, I don't think it is completely out of the new algorithm. I think reciprocal links are playing a very small role, but a significant role.
Sites with content, and many pages of content specific to the theme, are being driven to the top. A directory will achieve this because although they have only one page on the specific phrase, they will have many pages on the theme.
How you internally link and use anchor text in those links is not so important as what you actually have on the pages that are linked together. Google is applying much more relevence to internal links and those pages than in the past.
Google is not only looking at specific domain names for this but has also figured in the IP and evaluating the totality of the site or IP specific to the keyword phrase. But I'm not entirely convinced about IP because I haven't had time to study it as closely as I should and I'm sure people will be able to point out exceptions.
Reciprocal links, inbound only links, and outbound only links are another issue that plays a role that I haven't quite descerned yet. I think there is more emphasis now than in the past on outbound links going to relevent sites.
A side note: One thing that must have concerned Google is that they originally had the idea that sites were relevent if other sites were linking to them. This would mean they had good content if they were able to garner so many links. That did not turn out to be the case.
Instead of creating more content, webmasters were spending their time devising better ways to game Google with linking schemes.
Although I don't particularly like Florida because it hurt my industry especially hard, I did not think that something like this was very far off from happening, and I don't blame them for doing it. After all, they want better relevent content, not just those sites who are able to have the most links.
Ref: relevant content.
There's a deep problem with this concept of relevant content. We live in a consumer age. People admire (even love) technology. We're not in Elizabethan England where the latest religious dogma is of prime interest. Folks want to know about products, and which are the best products, and they want to find sites that can inform them about products.
In this modern age - which Google is apparently trying to ignore, whilst at the same time being central to - a search based on finding product information is equally as valid as any other search. If I want a digital camera, for example, I don't want to know about the latest physics regarding the quantum efficiency of the CCD; and I don't want comparison sites. I'd like to see a quality 'digital camera' site, with clear and succinct explantions of the technologies involved.
A biased view - yes. A reasonable proposal - yes also!
[edited by: superscript at 7:49 pm (utc) on Dec. 17, 2003]
>Although I don't particularly like Florida because it hurt my industry especially hard, I did not think that something like this was very far off from happening, and I don't blame them for doing it. After all, they want better relevent content, not just those sites who are able to have the most links.
I still suspect this was due more to some attempt to improve relevancy of non-commercial SERPs. And stop and think. Some programmer goes to the boss saying that he has an idea to improve relevancy of non-commercial SERPs. He then comments a side effect is commercial SERPs will be worse. That this side effect will increase Adwords sales will make the boss more likely to implement the idea. The beauty of this is the boss can justify his actions to others as not being motivated by increasing Adword sales.
>In this modern age - which Google is apparently trying to ignore, whilst at the same time being central in - a search based on finding product information is equally as valid as any other search. If I want a digital camera, for example, I don't want to know about the latest physics regarding the quantum efficiency of the CCD; and I don't want comparison sites. I'd like to see a quality 'digital camera' site, with clear and succinct explantions of the technologies involved.
Google may be thinking that the main search engine should focus on non-commercial topics, and Froogle is for people who want to buy. Google created a new search engine (Froogle) just for the shoppers.
|Today, I suddenly found the index page sitting around 300 with a fresh date of yesterday. Granted, that's nowhere near where it used to be, but it's much better than the >1000 it was at before. |
Definately a start! Congrats on getting back in there.
My main site re-appeared in the index about 2 weeks ago, but is in a different spot on almost every datacenter. The only 3 DCs that show the same results (and for me the best) are -in, www2, and www3. Those 3 dcs show us as #3, and the other are bouncing back and forth between #7 and #57. It seems when they show us at #58, we have fresh dates, but never as #7. The #3 spots on 2,3 and -in always have fresh dates.
It is almost as if the -in, 2 & 3 are running a completely different version of the algo.
I am starting to see the main differences between the sites that dropped, those that stayed, and those that are re-appearing. Those that stayed are either very heavy on the text, or very short on the text. The new #1, only has like 10 lines of text total on his index page and no sub-pages. Bue each of those lines are short descriptive phrases. No internal links at all btw.
The new #2 site has not text on there site, just images and descriptions that are similiar in nature, but not exact to the keyword phrase.
|There is definately a dictionary in play |
I am not sure I agree with that. I think the sites that disapppeared were removed because the main keywords were used to often in the extact same order. The first sites that reappeared after Florida had normal densities on their keywords, but also had lots of supporting themes.
I think the new algo is learning and we will see the changes as it grows.
On a side note: It will be interesting to see if consumers begin to get way more specific/creative/exact on their searches.
>Google may be thinking that the main search engine should focus on non-commercial topics, and Froogle is for people who want to buy. Google created a new search engine (Froogle) just for the shoppers.
Maybe, but then they forgot to tell the world's consumers to switch to Froogle. ;)
Besides, I thought that was what adwords were for.
I dont think Y! will make that mistake.
Agreed - 'Froogle the answer to Florida'. I've posted as such before. But on the face of it this looks like yet another conspiracy theory - none of which make sense.
Why roll out a non-comnmercial algo whilst Froogle is in beta? The mess-up theory is still my preferred opinion.
|Maybe, but then they forgot to tell the world's consumers to switch to Froogle. ;) |
Besides, I thought that was what adwords were for.
Google does offer Froogle on their main search now:)
But the problems with Froogle and Adwords are that Froogle requires a monthly upload and Adwords are positioned unnaturally on the page. I mean people read left to right.
I NEVER noticed adwords ads until I was looking for new marketing strategies.
>It is almost as if the -in, 2 & 3 are running a completely different version of the algo.
It is also very different than what -in was displaying 2 weeks ago. I was non-existent and now back to #2 for both of my main kws.
Also seeing a site that was bounced 3 months ago for using <h1> for every sentance has now returned. Doesnt look like OOP to me, but anchor text ruling once again. Sticky me for the url if you want.
There are plenty of product lines for which froogle will never work. Froogle is for price comparing identical items across multiple vendors, i.e. major brand name products. At least the froogle we see now, but perhaps it too will change.
I am an importer. I create original products in a niche market. How would anyone ever find my products on froogle?
The only solution I have come up with so far would be to keyword stuff the product names. Not a good solution for anyone, IMO.
In order for people to find me on froogle, they would have to know my products by name. A niche market vendor like myself cannot (while remaining cost effective) develop a product line with widespread brand identity (like a consumer electronics manufacturer could, for example).
Point being: we need Google, not froogle! We need G to remain unbiased towards commerce sites, and so far I think they are doing so. I didn't feel that way in the days following Florida, but things are getting much better.
Page 1 pre-florida
Nowhere after Florida
Page 40s/30s past few weeks
Currently Page 4 on www and Page 3 on -in, which looks good to me!
>Why roll out a non-comnmercial algo whilst Froogle is in beta? The mess-up theory is still my preferred opinion.
While technically in beta, Froogle is linked to from the Google home page. And, most people will never notice the beta mention at top. It also occurs to me that to nudge people in the direction of Froogle might require they mess up the commercial SERPs.
|too much information|
Here's the problem with the whole conspiracy theory thing.
I have a site that was hammered for it's targeted keyword combination. It's still gone for that search, but it's #1 for a search on the topic.
Location Market Widgeter - Gone from SERPs
Location Market Widgetry - #1
The thing is that the page discusses how "Location Market Widgeter" has been doing Widgetry in this Location for that Market for --- years, etc.
So the page really is more relevant for the second set of keywords.
Not only that but the same page is also top 5 for "Location Widgetry" and top 10 for "Location Market" where it never appeared above page 3 for these if it showed at all.
If your site is 'gone' maybe you should try a topical search to see if it is just somewhere else. (Somewhere that nobody looks)
If you are a lazy 'Joe Surfer' looking at your page, and someone asks "What is that page about" what terms would you use? (Not what terms would you want people to use)
I like the CIRCA theory better than the filter and commercial term theories. At least from what I'm seeing.
| This 260 message thread spans 9 pages: < < 260 ( 1 2 3 4 5  7 8 9 ) > > |