| This 260 message thread spans 9 pages: < < 260 ( 1 2 3  5 6 7 8 9 ) > > || |
|Google's Florida Update - a fresh look|
We've been around the houses - why not technical difficulties?
For the past four or five weeks, some of the greatest (and leastest) Internet minds (I include myself in the latter) have been trying to figure out what has been going on with Google.
We have collectively lurched between one conspiracy theory and another - got ourseleves in to a few disagreements - but essentially found ourselves nowhere!
Theories have involved Adwords (does anyone remember the 'dictionary' concept - now past history.)
A commercial filter, an OOP filter, a problem caused by mistaken duplicate content, theories based on the contents of the Directory (which is a mess), doorway pages (my fault mainly!) etc. etc.
Leading to the absurd concept that you might be forced to de-optimise, in order to optimise.
Which is a form of optimisation in itself.
But early on, someone posted a reference to Occam and his razor.
Perhaps - and this might sound too simple! - Google is experiencing difficulties.
Consider this, if Google is experiencing technical difficulties regarding the sheer number of pages to be indexed, then the affected pages will be the ones with many SERPs to sort. And the pages with many SERPs to sort are likely to be commercial ones - because there is so much competition.
So the proposal is this:
There is no commercial filter, there is no Adwords filter -Google is experiencing technical difficulties in a new algo due to the sheer number of pages to be considered in certain areas. On page factors havbe suffered, and the result is Florida.
You are all welcome to shoot me down in flames - but at least it is a simple solution.
|Terms=one or more words strung together with a unique meaning |
That looks exactly like what has happened in this last update and supports the dictionary theory suggested by Daniel Brandt.
I have quite a few sites which string together word combinations and have moved around in the SERPs significantly.
>This could also mean that the algo isn't "broke", in the sense that Google may be doing this intentionally and on purpose. What Google thinks the algo should do may be different than what some webmasters think it should.
Absolutely agree with this. Brett also said the following in message #25 which seems to account for many of the inconsistencies in the most affected cats.
|Why does 80% of it explain Florida? Because most of the utterly bizarre searches we've seen since Florida, are in sectors that have very little text on the page. Most are in keyword spaces where generated pages are the norm (travel, hotel, weather, drugs, shopping/commercial product cats, shopping cart driven purchase pages, and ultimately - every ones index pages). There just isn't enough text on those pages to make heads-nor-tails of. It explains the "dictionary" phenom and the "over optimization" phenom. |
I don't think it explains everything and I do agree there must have been some specific filters that nailed people in certain categories.
From where I sit I am inclined to agree with Philoposher on the use of Applied Semantic. Our key search phrase is returning very odd results. (I think most users would agree that they are odd) However, in hindsite, if I analize that phrase, certainly it has other meanings. Not ones that the common user would expect, but from a "semantic" view ones that fit.
CIRCA White Paper [22.214.171.124]
Might want to get it quick as the actual PDF file has been removed from the site.
That was clean Philosopher, good work.
Pulled that baby right up from cache, wonder how long it'll remain there...
Glad I could help.
A word of warning. A well caffeinated drink is advisable when reading any white paper, but the more I read through this one, the more "oddities" are explained.
I have been beating the bushes for clues and I need some guidance...
For my site - KW1 is my money word, yet all my links that I have coming in show KW1 and KW2 togther (nect to each other)
After Florida, I am now number 4 for KW1 KW2
This doesnt do me any good but the theory is....
Could G actually be looking at ALL links ( regardless of whether it actually credits you with the links ) and KW in the links and then finally decide on the percentage of where they fall and send you to that search?
If this is the case, then nothing is broken and it is actually just working close to what the finished product will end up lloking like.
Thanks for your input.
Very interesting, reading that white paper. Could some of the better minds on this forum give some input into the following conclusions, based on the hypothesis that Google is using the semantic methods described in that document.
1. It might be useful to think of the new google ranking system as like a shpere. At the center of it is the exact query. Orbiting closely are associated words (center = widget) widgets, widgetz, widgeting, widgetology
2. Orbiting at various points are more nebulous concepts - gadgets, widget practice, based on continually improving "type" analysis.
3. A search query might now be thought of as "find pages that best describe the sphere with the center 'widgets'".
4. A page that is overly dense at its center might be thought of as spam?
OK, a lot has been said about Florida but what is really needed are real-life examples. Here’s my 2 penny's-worth.
For the first 2 weeks Florida was a huge success for us with new pages indexed and included and revenue doubling. Then, just as the topic became mainstream in the UK we lost placement with almost every page - down about 10 places only, but it halved revenue (or a quarter of the new revenue, for those paying attention!)
Fortunately for us, Inktomi took up the slack 2 days later so we didn't lose much, but it could have been a disaster.
Four weeks on, and the situation has reversed. Google has now put back all the terms and given them a good placement. Result? A nice little earner!
OK, so we were lucky to have Inktomi 'fall in' at just the right time (leading to most of our traffic for 2 weeks coming from MSN), but the issue is still valid: what's happening with Google?
As far as we are concerned, they kept all previous data but gave preference to older data. Even so, they downgraded that data in the SERPs.
I have never posted before, but am very experienced (although SEO isn't my business), and wonder if the old issue of old (unchanged) data taking preference has come back, but with a site's SERP placement reduced if there is a lot of new data to process?
You can fanny around with everything else - anchor text etc, but our site hasn't changed for months and still it was hit by Florida. Thank God that now it's back up there where it belongs, but what a wake-up call for how powerful Google has become.
|4. A page that is overly dense at its center might be thought of as spam? |
Not had time to read the paper, just snagged it but...
That sounds like an over optimisation penalty. Except there isn't a penalty against optimisation but there is against optimization.
I would be quite pleased that the designers of this system have only done it for the American dialect of English except their motivation has led them to include brand names (and place names). Now if your small niche generic widget in the UK is a completely unassociated major brand in the US then you have problems. If the main term is something that you would have to spend money on for that major US brand BAM! you are out of serps if your page looks too dense.
Enough jumping to conclusions, I'll go and have a read. Thanks to Philosopher for spotting this white paper.
[small]Fixed spelling error, <sarcasm> perhaps AppliedSemantics would like to sponsor the forum and give us a spell checker</sarcasm>
[edited by: Hissingsid at 10:11 am (utc) on Dec. 16, 2003]
I tend to disagree about the OOP. I suspect it is more likely that people's main repeated phrase is being nailed as the name of their site (especially if it's the title/h1 etc.) - and not keywords.
Try re-reading the following section and see if you agree - Named Entity Recognition and Regular Pattern Identification
So your site (built for optimisation from the site "name" up) - "Super Dooper Red Widgets" - is having every instance treated as an entity and not a concept.
|4. A page that is overly dense at its center might be thought of as spam? |
And if correct, this is the t**d in the swimming pool.
If I have a rock and look at with a red light, its still just a rock, it's not a red rock. Similarly, a spammy page is a spammy page irrespective of the search terms typed in that made it appear.
Classing pages as spam/not-spam depending upon search terms is stupid. If that's what Google are doing they are stupid.
|Each recognized unit is marked as a term, and associated with a certain probability that the series should be treated as a unit |
I agree with you, it's a question of OOP for a term, and terms are much more easily singled out as over-optimized than tokens.
If that term happens to be your company name and appears in the title or H1 tag (and too high in density in the body tag) THEN you get nailed!
As kaled says, classifying pages as spam/not-spam depending on terms really would be stupid.
If this is what's happening then getting back to the top of SERPs means de-classifying your keyword phrase from term to a series of tokens.
Anybody up for a little minestrone?
If you combine the speed harnessing abilities of Kaltix with the "understanding" of Applied Semantics what do you have?
A very hard life for SEO's, of course.
The idea with the semantics/commercial algo is to avoid targetting. Don't get caught up in trying to please it.
It's much easier(and faster) to identify/downgrade "commercial speak" than it is to upgrade non-commercial stuff. Stay away from commercial characteristics and you may get in under the radar (for awhile).
<The idea with the semantics/commercial algo is to avoid targetting. Don't get caught up in trying to please it.>
An ontology literally is a formal description of what exists. An ontological information model is therefore typically richer and more objective, including different levels of generalization/specialization , a layer of rules and the traditional entities and relationships currently used.
A semantic model based on ontology includes:
Classes: Sets of real-life entities with some characteristics in common, and the generalization/specialization relationships between them
Properties: Relate these classes to each other
Business Rules: Indicate constraints on and relationships between properties
|The idea with the semantics/commercial algo is to avoid targetting. |
So what you are saying is that if I find the complete antisithis of the term that someone is searching for then my site will be #1. I can actually fool the new algo by feeding it the exact opposite.
So I optimise my site for red string theory and I'll get to #1 for blue specific widget.
Just in case anyone is so confused that they actually try that please don't I was just joking. It looks to me very much like CIRCA technology is a distilation of Monty Python's Flying Circus "Word Association Football" sketch and The Hitch Hikers Guide to the Galaxy "Improbability Drive". Both of these documents will no doubt become essential reading for students wishing to trace the source of CIRCA technology.
A site index page has just appeared at #1 for my main target search term. It uses the term just twice "widget thing" like this <A href="frames_widget.html" class="centre" title="widget thing service">widget thing</A>
That's two words out of 125 user viewable words. PageRank7 no other page in SERPs has a PR above 5.
The other interesting thing about this site is that it has anchor text and links to pages about topics which in the American dialect are synonyms of the particular term searched for.
In #2, #3 and #4 are pages each from different sites all of which have PR5. Each of these is much more highly optimised for the keyword pair in title, alts anchors and in one case heads. All of them have links out and use words which in the American dialect are synonyms of the particular term searched for.
I would be interested in hearing if others are seeing anything like this. One conclusion could be: Use your keyword pair together, exactly as you expect it to be searched for, but sparingly and use some synonyms for that term, making sure that you link out to pages which contain those synonyms and if you have a high PR you will rise to the top.
In broad terms the top result looks like an important page (PR7) which is about the topic searched for and which has links to pages which are on related subjects. It looks like an authority site but it is only broadly relevant. Having taken a quick look through I would say that more than half of the sites in the top 50 are more specifically relevant.
The problem that causes this is the difference between English and American English coupled with what is a brand name in the US being a generic for something entirely different in the UK.
In the words of Ira Gershwin
You say either and I say either
You say neither I say neither
Either either , neither neither
Let's call the whole thing off
You are still focused on key words, the purpose of semantics is to discern user intent. While kw still play a role, the factoring in of intent (intending to purchase) gives a new perspective to how pages are ranked.
|the purpose of semantics is to discern user intent |
Thats just too broad to grasp. What we need to do is break it down into steps. What does that mean I have to do in specifics.
These concepts are difficult to grasp so we need to know specifically how does it work out what is intended and what do we have to do to serve that intent. Without killing our ranking on Alltheweb, Inktomi and Teoma.
It's hard enough to understand never mind explain.
[edited by: seofreak at 6:15 pm (utc) on Dec. 16, 2003]
Sid, perhaps reviewing the "Relationships" in "C. Architecture of the Ontology," for example, "Entailment (e.g. “buying” entails “paying”)" will nudge your thinking.
My feeling is that BOTH keywords AND intent are being evaluated by the New and improved G - google, but as Sid so sublimely asks
|What does that mean I have to do in specifics? |
If intent is determined by who's linking to YOU and who YOU are linking to, then we have some say in the matter. I can get a link in a directory for my service and link back to them.
If intent is determined by a myriad array of associations and concepts, some of which may be evident - some of which may not, then it gets complicated.
I don't think Google has implemented such a complicated algorithm, I think it's a simple matter of certain KW phrases being singled out as spam when they exceed some arbitrary limit.
Where is that limit? Beats me. I suspect there is a value system where you sort of accumulate "points" much like when you run a red light or speed, and Google takes away your license after a certain amount.
For example we may exceed our limit when body density exceeds some percentage AND we have our title tag using that same KW phrase. I really wish I knew.
|Without killing our ranking on Alltheweb, Inktomi and Teoma. |
Sid, you make a good point and one that I am debating myself.
How can we make changes, or even attempt changes without risking our ranking in other search engines?
I have come to the conclusion that it may be wiser to leave things the way they are and dedicate new resources to Google, but that of course require time and money...
Yo! Anybody got some extra time and money on them?
|Sid, you make a good point and one that I am debating myself. |
How can we make changes, or even attempt changes without risking our ranking in other search engines?
I have come to the conclusion that it may be wiser to leave things the way they are
Since this algo is probably still cooking and therefore a moving target, making changes solely for the benefit of a better ranking at this point is foolish at best.
Is it not possible that one purpose of this algo change is to make it tough for a page to be all things to all search engines? If you are Google and you know you will lose Y!, would it not be to your benefit to make webmasters choose which SE master they will serve?
I would bet that at SES there were more people wanting to talk to Google than INK.
> Non-English Languages.
Hello everybody. Just a point: I'm working in a quite competitive area in a non-English language, as you will notice for my English ;-) Well, I've been reading a lot of posts here and I can say I've seen in the SERPS of my non-English language area all the symptoms of Florida Update. So we can consider Florida a international thing.
How semantics play with algorithms.
Semantics (the short story): Categorization of a document into a category (or set of categories).
I'm assuming for purpose of illustration two very broad categories i.e.: COMMERCE and NONCOMMERCE. It would be fairly trivial to identify words and phrases and patterns such as "buy", purchase","check out", "We accept all major","toll free" and identify/create a category called COMMERCE, likewise for NONCOMMERCE (default), you then assign your pages to a category.
Crude (very crude) algo.
# calculate score
weighting -- body<200> emph<500> title<1000> citation<500>
So, if a page had one instance of "widget" in its body with emphasis and the word "widget" was in the title and the page had one citation (inbound link) it would score 2200 for the term "widget".
Now throw in a category score:
# calculate score
weighting -- body<200> emph<500> title<1000> citation<500> commerce<0> noncommerce<400>
With the commerce weight thrown in, if the widget page isn't in the category "commerce" it now scores 2600 for the term "widget". If the widget page is in the "commerce" category it scores 2200.
Pretty basic stuff.
[edited by: john316 at 7:45 pm (utc) on Dec. 16, 2003]
I am way back at the beginning of the thread, reading about semantics and this comes to mind:
When I go to the library and search, the "Did you really mean xyz" always kicks in.
I am finding that searches for
return results which ask at the top
Did you really mean
purple cats; purple automobiles; magenta bananas?
because all of them are food, right?
Likewise, searching for purple cats will return a response asking:
Did you really mean
purple bananas; purple automobiles; magenta bananas?
In certain critical keyword areas, it gives some very interesting insight into what kinds of totally unrelated phrases are considered related due to them being searched for by the same groups of people.
|You are still focused on key words, the purpose of semantics is to discern user intent. While kw still play a role, the factoring in of intent (intending to purchase) gives a new perspective to how pages are ranked. |
Most searches consist of a few keywords. Whilst it may sometimes be possible to distinguish a meaning from a few keywords, generally it is not. It therefore follows that any algo that tries to do so will have a very high failure rate. It therefor follows that any search engine that filters results according to a guessed meaning will frequently fail to satisfy the user.
If Google move in this direction, user satisfaction will fall and users will go elsewhere. I'm beginning to see an increase in hits from MSN.
I'm sorry, but good as Google engineers may be, I don't believe they can build a psychic search engine. Discerning intended meaning from a few keywords is not possible in the general case, nor shall it ever be in the future. And if you can't discern the user's intent, discerning the meaning of web pages/sites is almost certainly a complete waste of time.
All this really does is firmly shut the door for mom and pop site owners who have provided decent content for google users. The layer of complexity that was added to the algo will be cracked and exploited only by the very best and determined spammers.
Pretty sure you'll be seeing some really poor results in the near future.
Before we go too far down the "doom and gloom" path and forecasting Google's imminent demise, let's recall that AdSense has been reasonably successful (after some teething problems) for both Google and webmasters. It almost surely is using a semantic algo.
DaveAtIFG, I might add however, that there's a big difference between how advertising works and how search works.
If I search on 'coffee makers' there is a reasonable chance that I might click on an ad for 'kitchen utensils'. But if I'm searching on 'coffee makers', you can be pretty certain that 'coffee makers' are what I'm most immediately interested in.
To extend the point, if you search on 'Scuba Diving,' I may well be able to sell you something in the area of 'Air Fares.'
G seems in commercial cat's to be operating somewhere in between the two sets of examples noted above, though I have noted a *tiny* bit of positive progress on -in....
<The layer of complexity that was added to the algo will be cracked and exploited only by the very best and determined spammers.>
checking my top 10 competitors I see in their frontpage a lot of keyword related terms – in fact related products/services, so a solution could to be REAL CONTENT
IMO anchor text is still very important – see “Miserable Failure”
Searching for kyword1 keyword2 I found #2 optimized for synonym1 keyword2
and google cache provide the following details :
"These search terms have been highlighted : keyword 2
These terms only appear in links pointing to this page: synonym1"
| This 260 message thread spans 9 pages: < < 260 ( 1 2 3  5 6 7 8 9 ) > > |