So to sum up the threads so far, Google is now judging pages not by the number of keywords appearing on it but by natural (human) speech patterns.
Is that right? Should be interesting if it works pages should have a wider range of subjects within them, by that they will not just feature green widgets or blue ones but the process by which the raw materials are extracted, resulting in a green widget.
Hmmmmm interesting indeed.
One thing I have noticed is the low combination of keywords to filler text on top ranking sites somewhere between 5% - 10%, which tends to suggest that I am correct about speech patterns.
being the Brave English man that I am I have just edited a page to prove my point so if it works Ill brag about on WW forever more or... Youll all hear my screams.
|pages should have a wider range of subjects within them, by that they will not just feature green widgets or blue ones but the process by which the raw materials are extracted, resulting in a green widget |
This is certainly part of it, coming under the Applied Semantics / Broad Matching / Stemming umbrella.
I don't want to confuse the issue, because it's good to pause for summary - but my problem is that I *don't* think this concept is being applied across the board. I don't think, for example, that it's being applied to non-commerce sites. The justification for this is simple: there has been nothing like the mass movement of commerce sites observed in non-commerce sites.
This, I think, is where the confusion arises - and why the 'filter' concept is so hard to shake off; whilst at the same time, so difficult to reconcile with broad matching. Sid?
(By way of justification: non-commerce sites are optimised too - especially if written by professional web designers. And my science sites have been optimised by me - why? Because I want people to find them and read them! Needless to say, they haven't budged an inch in the SERPs!)
It would also seem to suggest off line processing of the filter being applied, so as to make searching quicker. This would be supported by the fact that pages are not instantly appearing back in the system when re-cached after de-optimisation.
This theory would then make it obvious, they're only going to pre-render filters for the most popular search terms, generally commercial searches. This is only a suggestion though, as someone's bound to come up with a hugely popular but non commercial search term which is unaffected.
|It would also seem to suggest off line processing of the filter being applied |
Could you expand on this? I presume you mean that the effects of the 'filter' (I use the word cautiously) is built into the SERPs at the datacenter - not applied immediately after the search. (I always presumed this, but it shows how many interpretations of the word are possible - yours seems a more technical one)
Perhaps the filters that they're applying are fairly complex and time consuming, thus not worth doing on the fly. To get around this, they work out which terms are the most abused, or most popular, whichever is most appropriate, and set a machine off churning through the pages in it's result set for that term. If the page has enough reason to set off the fiter (e.g. over optimisation, cross linking etc.) then they simply set a flag to mark down that page when searching for that specific term.
Obviously it's just a guess, but it would be simple enough to implement, and not affect the speed of search results.
The problem is that we have something that 'walks like a filter, quacks like a filter, but may not in fact be a filter'*.
Do you personally believe a 'filter' or 'filters' is in place?
*Probably a pigeon :)
[edited by: superscript at 2:13 pm (utc) on Dec. 20, 2003]
|It would also seem to suggest off line processing of the filter being applied |
An offline taxonomical categorisation would look like a filter. In effect Google could be producing a directory structure for a list of search terms.
Enter blue widget and you get the blue widget directory. If Google recalculated the taxonomy once a week and stored the results for each of those major search terms it would reduce processing requirements massively.
Co-incidence - Applied Semantics have a product that does this categorisation.
Co-incidence - Applied Semantics has DMOZ taxonomy
|Enter blue widget and you get the blue widget directory |
An appealing concept - but why would such a directory throw up pages with only passing relevance to the category, rather than one that is bang on target? Are you suggesting the directory categories are too broad? I'd be perfectly happy to stand alongside my competitors in my old DMOZ category - we all sold blue furry widgets in the UK.
|An appealing concept - but why would such a directory throw up pages with only passing relevance to the category |
In a word CIRCA.
CIRCA + Autocategorize = Florida.
PS Based on research and a bit more than an educated guess. Interestingly I've just noticed that directory results have changed and have hopefully started a new thread on this so as not to contaminate this one.
From Googles own press release when they acquired AS it was touted as a means to deliver better placed adwords. A natural extension of this would be to use the AS information already gathered for adwords and apply it to preformatted data in general searches. IMHO this is the Florida we know... the first tests of this. It is not really a filter as we would have known it before Florida. I do think Google has intentions of improving results...but really time spent on why is probably not well spent effort. For sites not affected by Florida yet, wait for next update.
Suck on this. I'll rephrase ;) Try this:
In a nutshell:
DMOZ is gone, Google is attempting to create an automated directory of its own to replace it; G's using some kind of broad matching technology to build it; but is initially restricting its taxonomy to the words/tokens/terms it knows the most about i.e. commercial terms (due to data from Froogle/Adwords etc.)
Result: strange commercial SERPs; standard non-commercial SERPs; the illusion of a 'filter' because the technology is in its infancy and, as yet, only applied to commercial sectors.
This would finally draw the broad-matching / filter / directory / dictionary ideas together (mainly due to your insight).
Anyway, enough of Google studies, I'm off to the pub to study the barmaid...
[edited by: superscript at 3:00 pm (utc) on Dec. 20, 2003]
Because the technology is new, we are only seeing it where adword data was there before...I think that is it.
That it appears as a filter because it is new...I think the truth will hurt here but it is because most commercial sites are relatively shallow with depth of content as compared to the .org and .edu pages when you discount "kw1 kw2" text.
What we need is a method to extract from Google the tokens/synonyms Google associates with "kw1 kw2".
Astonishing! Google thinks my widget is a wodget! Any suggestions for the syntax for 2 or 3 KW phrases though? I can't get quote marks to work (see below).
edit: Certainly Google also appears to associate my secondary keyword with mental health - bizarre - it has stemmed the abbreviation of a physical item to a mental heath institute <(:oP>
edit II: ~KW1 ~KW2 -KW1 -KW2 is syntactically correct for a phrase, but the search results seem less revealing.
I wondered if anyone else had seen this. Been using it for a couple weeks now post Florida...found reference to it again by accident. Back in August or so Google released this synonym tool and it was seen as just another nerdy tool in the forums. I hope it stays up.
Relating to my earlier post on p17, you al seem surprised that its ecomm sites being hit and not general info sites.
Why? Who use adwords the most? A 14 year old with a hobby site or business with employees?
By 'astonishing' - I didn't mean the synonym tool itself, but the results of applying what James_Dale suggested. And note that it has *no* effect on the Adwords presented - even though their specific terms have been excluded from the search.
[edited by: superscript at 4:12 pm (utc) on Dec. 20, 2003]
I don't assume that the synonym tool has an all inclusive dataset of tokens shared by googles search results tokens ... though there are probably shared tokens between the two.
I do think that Florida works alot like this tool though.
In MHO, It's a filter and has nothing to do with CIRCA Why?
I operate travel websites. Go to any city and search a two keyword term (cityname hotels) and you will find the top 50 results are pretty much in the same order no matter the city (most times) It's mostly the majors, exactly the opposite of what was pre-florida.
I can take just about any city that I operate in and re-keyword the page to (cityname resorts) or (cityname suites) or (cityname Inns) and make a decent ranking. Cannot do it with hotels.
No why is that?
CIRCA - No way if you subscribe to it full and broad implementation.
Under CIRCA, If I am not an authority on Hotels, than why is it ok for me to be an authority on resorts, suites or inns.
It's a filter, follow the money.
Example: same cityname hotels produced 50,000 searches last month.
same cityname resorts produced 2,500 searches last month.
It's a deliberate massaging of the results to provide the largest monetary gain to the search engine. It really is that simple and pre- determined. That's why the most affected are the commercial sites and not very many informational sites.
Of course the algo has changed to kill the backlink spammers and other means, but the filter does the final sorting out, even for the non spammers, so the monetary gains are maximized.
Fits the picture doesn't it.
Since it is becoming clear we are into new update territory - tis time to bring a close to the Florida discussions and move onward.
| This 260 message thread spans 9 pages: < < 260 ( 1 2 3 4 5 6 7 8  ) |