Forum Moderators: open
We have collectively lurched between one conspiracy theory and another - got ourseleves in to a few disagreements - but essentially found ourselves nowhere!
Theories have involved Adwords (does anyone remember the 'dictionary' concept - now past history.)
And Froogle...
A commercial filter, an OOP filter, a problem caused by mistaken duplicate content, theories based on the contents of the Directory (which is a mess), doorway pages (my fault mainly!) etc. etc.
Leading to the absurd concept that you might be forced to de-optimise, in order to optimise.
Which is a form of optimisation in itself.
But early on, someone posted a reference to Occam and his razor.
Perhaps - and this might sound too simple! - Google is experiencing difficulties.
Consider this, if Google is experiencing technical difficulties regarding the sheer number of pages to be indexed, then the affected pages will be the ones with many SERPs to sort. And the pages with many SERPs to sort are likely to be commercial ones - because there is so much competition.
So the proposal is this:
There is no commercial filter, there is no Adwords filter -Google is experiencing technical difficulties in a new algo due to the sheer number of pages to be considered in certain areas. On page factors havbe suffered, and the result is Florida.
You are all welcome to shoot me down in flames - but at least it is a simple solution.
Is that right? Should be interesting if it works pages should have a wider range of subjects within them, by that they will not just feature green widgets or blue ones but the process by which the raw materials are extracted, resulting in a green widget.
Hmmmmm interesting indeed.
One thing I have noticed is the low combination of keywords to filler text on top ranking sites somewhere between 5% - 10%, which tends to suggest that I am correct about speech patterns.
being the Brave English man that I am I have just edited a page to prove my point so if it works Ill brag about on WW forever more or... Youll all hear my screams.
pages should have a wider range of subjects within them, by that they will not just feature green widgets or blue ones but the process by which the raw materials are extracted, resulting in a green widget
This is certainly part of it, coming under the Applied Semantics / Broad Matching / Stemming umbrella.
I don't want to confuse the issue, because it's good to pause for summary - but my problem is that I *don't* think this concept is being applied across the board. I don't think, for example, that it's being applied to non-commerce sites. The justification for this is simple: there has been nothing like the mass movement of commerce sites observed in non-commerce sites.
This, I think, is where the confusion arises - and why the 'filter' concept is so hard to shake off; whilst at the same time, so difficult to reconcile with broad matching. Sid?
(By way of justification: non-commerce sites are optimised too - especially if written by professional web designers. And my science sites have been optimised by me - why? Because I want people to find them and read them! Needless to say, they haven't budged an inch in the SERPs!)
This theory would then make it obvious, they're only going to pre-render filters for the most popular search terms, generally commercial searches. This is only a suggestion though, as someone's bound to come up with a hugely popular but non commercial search term which is unaffected.
It would also seem to suggest off line processing of the filter being applied
Could you expand on this? I presume you mean that the effects of the 'filter' (I use the word cautiously) is built into the SERPs at the datacenter - not applied immediately after the search. (I always presumed this, but it shows how many interpretations of the word are possible - yours seems a more technical one)
Obviously it's just a guess, but it would be simple enough to implement, and not affect the speed of search results.
The problem is that we have something that 'walks like a filter, quacks like a filter, but may not in fact be a filter'*.
Do you personally believe a 'filter' or 'filters' is in place?
*Probably a pigeon :)
[edited by: superscript at 2:13 pm (utc) on Dec. 20, 2003]
It would also seem to suggest off line processing of the filter being applied
An offline taxonomical categorisation would look like a filter. In effect Google could be producing a directory structure for a list of search terms.
Enter blue widget and you get the blue widget directory. If Google recalculated the taxonomy once a week and stored the results for each of those major search terms it would reduce processing requirements massively.
Co-incidence - Applied Semantics have a product that does this categorisation.
Co-incidence - Applied Semantics has DMOZ taxonomy
Best wishes
Sid
Enter blue widget and you get the blue widget directory
An appealing concept - but why would such a directory throw up pages with only passing relevance to the category, rather than one that is bang on target? Are you suggesting the directory categories are too broad? I'd be perfectly happy to stand alongside my competitors in my old DMOZ category - we all sold blue furry widgets in the UK.
An appealing concept - but why would such a directory throw up pages with only passing relevance to the category
In a word CIRCA.
CIRCA + Autocategorize = Florida.
Sid
PS Based on research and a bit more than an educated guess. Interestingly I've just noticed that directory results have changed and have hopefully started a new thread on this so as not to contaminate this one.
Suck on this. I'll rephrase ;) Try this:
In a nutshell:
DMOZ is gone, Google is attempting to create an automated directory of its own to replace it; G's using some kind of broad matching technology to build it; but is initially restricting its taxonomy to the words/tokens/terms it knows the most about i.e. commercial terms (due to data from Froogle/Adwords etc.)
Result: strange commercial SERPs; standard non-commercial SERPs; the illusion of a 'filter' because the technology is in its infancy and, as yet, only applied to commercial sectors.
This would finally draw the broad-matching / filter / directory / dictionary ideas together (mainly due to your insight).
Anyway, enough of Google studies, I'm off to the pub to study the barmaid...
[edited by: superscript at 3:00 pm (utc) on Dec. 20, 2003]
That it appears as a filter because it is new...I think the truth will hurt here but it is because most commercial sites are relatively shallow with depth of content as compared to the .org and .edu pages when you discount "kw1 kw2" text.
What we need is a method to extract from Google the tokens/synonyms Google associates with "kw1 kw2".
Astonishing! Google thinks my widget is a wodget! Any suggestions for the syntax for 2 or 3 KW phrases though? I can't get quote marks to work (see below).
edit: Certainly Google also appears to associate my secondary keyword with mental health - bizarre - it has stemmed the abbreviation of a physical item to a mental heath institute <(:oP>
edit II: ~KW1 ~KW2 -KW1 -KW2 is syntactically correct for a phrase, but the search results seem less revealing.
By 'astonishing' - I didn't mean the synonym tool itself, but the results of applying what James_Dale suggested. And note that it has *no* effect on the Adwords presented - even though their specific terms have been excluded from the search.
[edited by: superscript at 4:12 pm (utc) on Dec. 20, 2003]
I operate travel websites. Go to any city and search a two keyword term (cityname hotels) and you will find the top 50 results are pretty much in the same order no matter the city (most times) It's mostly the majors, exactly the opposite of what was pre-florida.
I can take just about any city that I operate in and re-keyword the page to (cityname resorts) or (cityname suites) or (cityname Inns) and make a decent ranking. Cannot do it with hotels.
No why is that?
CIRCA - No way if you subscribe to it full and broad implementation.
Under CIRCA, If I am not an authority on Hotels, than why is it ok for me to be an authority on resorts, suites or inns.
It's a filter, follow the money.
Example: same cityname hotels produced 50,000 searches last month.
same cityname resorts produced 2,500 searches last month.
It's a deliberate massaging of the results to provide the largest monetary gain to the search engine. It really is that simple and pre- determined. That's why the most affected are the commercial sites and not very many informational sites.
Of course the algo has changed to kill the backlink spammers and other means, but the filter does the final sorting out, even for the non spammers, so the monetary gains are maximized.
Fits the picture doesn't it.