Forum Moderators: open
We have collectively lurched between one conspiracy theory and another - got ourseleves in to a few disagreements - but essentially found ourselves nowhere!
Theories have involved Adwords (does anyone remember the 'dictionary' concept - now past history.)
And Froogle...
A commercial filter, an OOP filter, a problem caused by mistaken duplicate content, theories based on the contents of the Directory (which is a mess), doorway pages (my fault mainly!) etc. etc.
Leading to the absurd concept that you might be forced to de-optimise, in order to optimise.
Which is a form of optimisation in itself.
But early on, someone posted a reference to Occam and his razor.
Perhaps - and this might sound too simple! - Google is experiencing difficulties.
Consider this, if Google is experiencing technical difficulties regarding the sheer number of pages to be indexed, then the affected pages will be the ones with many SERPs to sort. And the pages with many SERPs to sort are likely to be commercial ones - because there is so much competition.
So the proposal is this:
There is no commercial filter, there is no Adwords filter -Google is experiencing technical difficulties in a new algo due to the sheer number of pages to be considered in certain areas. On page factors havbe suffered, and the result is Florida.
You are all welcome to shoot me down in flames - but at least it is a simple solution.
checking my top 10 competitors I see in their frontpage a lot of keyword related terms – in fact related products/services, so a solution could to be REAL CONTENT
IMO anchor text is still very important – see “Miserable Failure”
+
Searching for kyword1 keyword2 I found #2 optimized for synonym1 keyword2
and google cache provide the following details :
"These search terms have been highlighted : keyword 2
These terms only appear in links pointing to this page: synonym1"
IMO anchor text is still very important – see “Miserable Failure”
Hi Marin,
This new algo definitely has two approaches.
1. If term is in list run new algo.
2. If term is not in list keep old algo including massive skewing towards anchor text.
<assumption>miserable failure is not in the terms database therefore the old algo including reliance on anchor text operates.
Best wishes
Sid
They are now in the 50s and 60s rather than page one but at least they are not completely missing from the top 100 results as before. That still leaves 76 of the most technically competent pages about that technology nowhere to be found.
There do not appear to have been any changes to several of those pages in the past month and several of the others change radically evey day. The trick seems to be to call the technology something which no consumer would ever call it. Use a term which only the technology experts use.
It is kind of like if you search for Vitamin B6 and get no results but if you search for Pyridoxine Hydrochloride you get all the information you expect to get about Vitamin B6.
I can't make a pattern out of it other than that right now but will keep watching.
Discerning intended meaning from a few keywords is not possible in the general case, nor shall it ever be in the future. And if you can't discern the user's intent, discerning the meaning of web pages/sites is almost certainly a complete waste of time.
Hmmm...I remember another event that had an association with Florida and involved attempts at divining intent...
Original Message Follows:
------------------------
<snip>
[edited by: WebGuerrilla at 10:55 pm (utc) on Dec. 16, 2003]
[edit reason] Sorry, no email quotes [/edit]
So are we seeing a complete semantic algo or a certain amount of weight for the algo is semantic?
A strong point - if the algo has an Applied Semantic element - it can only be 'flavoured' by it. Otherwise there would be dramatic movement in purely scientific / informational sites. I've seen little movement of these sites. Hence the many posters on these boards with non-commercial sites hardly noticing that a major algo change had occurred.
In my mind, at least, the jury is still out on the Applied Semantics idea - either it has been mixed in very lightly, or we're back to the old 'commercial filter' idea.
A strong point - if the algo has an Applied Semantic element - it can only be 'flavoured' by it. Otherwise there would be dramatic movement in purely scientific / informational sites. I've seen little movement of these sites. Hence the many posters on these boards with non-commercial sites hardly noticing that a major algo change had occurred.
Hi,
I've read the paper on CIRCA and can see how it would be able to interpret the gist of a document and pigeon hole it as "that's about whatever". But I'm having trouble understanding how it could say "that's more about whatever than this other document." The following is what I think might be the simple explanation. Please feel free to tear it apart with vitreolic postings.
Assuming a technology closely related to what is described in the CIRCA paper is what is now being used and reassessing my analysis of pages that are at the top of SERPs. I would say that the new algo seems to "like" pages that are about the broad context of the subject searched for and which contain the actual search term to only a small extent. In other words it seems to be more sure that something is about blue widgets if it is also about blue widgetting, green widgets and the study of advanced widgetology. This may change over time but at the moment putting blue widget in context seems to be most important based on my very small sample.
Then it seems to rank the pages by PageRank (or backlinks but there is a strong correlation between the two). So CIRCA decides what a document is about and PageRank (or backlinks) decides how important it is in relation to other pages that are pigeon holed as being about that thing.
CIRCA may also have some input in ranking in this way. It may look at the pages that are linked to by the page in question and if they are about the thing in question then this helps the strengthen the context of the page being ranked. In other words a page is more about a thing if it is linked to pages that are about related subjects to that thing. It may be that being related to is better than being the same.
Someone mentioned how it might filter out pages that are too much about a particular topic as being spam. I'm going to read back to see if I can find the post that refers to it but if anyone has more info on this please share it.
On page 5 of the CIRCA paper they give the example of "bears witness", suggesting there are two meanings of this expression. Of course there are at least three. My favorite is "Bears witness picnic ground incident, Booboo and Yogi seen running into the woods." ;)
Best wishes
Sid
Highly competitive term #1 in serps...(This term only appears in links pointing to this page:)NB: Was in top 5 pre-florida but not #1
Hi Tantalus,
What you say here could be very interesting.
Is that page on the broad subject that it is appearing at #1 for?
Staying with the assumption that CIRCA is the silver bullet. Pure speculation but, perhaps an element of what CIRCA does to pigeon hole a page is to look at inbound link anchor text. If that text uses the exact term and the page pointed to is about the broad context then perhaps it is seeing that page as a good match for the broad context of what is being searched for.
If you want to stikie me details I would be pleased to take a look.
Best wishes
Sid
it might filter out pages that are too much about a particular topic as being spam
Sid, I am convinced that regardless of whether or not CIRCA technology is being implemented in the algorithm there is surely a filter applied.
I say this because the sites I am following which were previously at the top of SERPs are no longer in the first 1000 results for very specific searches containing EVIL KEYWORDS. One might argue that applied semantics is simply presenting other more relevant sites but I have looked at sites way down on the list and they have very very little to do with what mine offers. In my bread and butter site I have sections dedicated to my town and various services all in english, where as the only relevant sites similar to mine are in Italian - but they are not focused on the search phrases I am chasing.
What I'd like to know is what other webmasters have been able to dertermine about this filter?
Surely others reading these posts have been affected by a filter, where your website no longer appears at all and way down at number 999 you have a site which has little or nothing to do with your business.
Come on all you lurkers and experienced webmasters, let's get some feedback here and see if we can collectively come up with some possible reasons that trigger the filter and what course of action to take.
Come on all you lurkers and experienced webmasters, let's get some feedback here and see if we can collectively come up with some possible reasons that trigger the filter and what course of action to take.
Bobby I say this in the nicest possible way - what do you think everyone here has been doing for the last 6000 posts since Nov 15?
You have some reading to do if you think this question (or the one about the "filter") hasn't been asked. BTW I do think the answer is there amongst them.
I say this because the sites I am following which were previously at the top of SERPs are no longer in the first 1000 results for very specific searches containing EVIL KEYWORDS.
Hi Bobby,
Did the pages that were previously there contain evil keywords?
It could be argued that if CIRCA is used then it would look at the broad context of those evil keywords. If they or a token (word=token=word=token why did they change it from word I wonder?) within the search has some more inocent context perhaps that is given "strength" and evil contexts given less strangth.
Non evil example: car tires
If a site uses a high frequency and density of the phrase car tires then it is not as about car tires as one that talks about radial, pirelli, goodyear, tread pattern, car tires, auto tires, tractor tires, wheels etc etc.
Keeping with the CIRCA assumption context is now more important than frequency and density. Plus links to pages on subjects in that broad context and high PR and or backlinks (including context related or exact anchor text) may be important.
The point is that you can describe a process, knowing what we know about CIRCA, that would produce the result that you are seeing without there being a separate filter.
It may be that a weakness has been shown in CIRCA that has never previously been seen. Lets say that previously only a few million documents have been assessed. Now with 3.6 billion to work on anomalies which were never previously seen are being thrown up. The CIRCA database has "over two million unique terms". I would postulate that in order to be processed in the CIRCA technology the search entered has to match one of those 2 million terms exactly. This is why we are seeing what we think are very close terms either processed or not for no apparent logical reason in our eyes. This is also why searching for blue widget +the gives a different result from blue widget and also explains why they were able to fix the -nonsense thing.
I'm hoping that new terms are only added to the Ontology if there is a predetermined level of usage of the term which makes it worth adding. I'm also hoping that "Word Sense Disambiguation" will soon determine that, what is a brand name in the US, is a generic in the UK and realise that 99.99% of the searches are in the generic context not brand related context.
OK so I'm being unrealistic. But then I did go to school in a place called Hope.
Best wishes
Sid
As for what to do about it - it may sound a bit defeatist, but I'm not sure anything can be done. I'm
1. Hoping that Google will eventually soften its commercial results
2. Investing heavily in magazine advertising, in the hope of throwing off the Google yoke forever
3. NOT using Adwords ;)
[webmasterworld.com ] (Goto Message 289)
I think if CIRCA technology is being used at Google, a very basic explanation of how it works could be:
1. Analyse user's imput (the tokens-words) and extract the meaning.
2. Return relevant SERPs for that meaning (no for the tokens-words)
So, if I search first for "kid", and after for "child", I will get very similar SERPs, because the meaning of these tokens-words is very similar. But what I'm getting for "kid" and "child" is completely different SERPS.
Please do not feel offended by my call to lurkers and other webmasters who have not posted specifics, I appreciate your posts as well as all the others who are pointedly trying to narrow things down here. I simply wanted to stimulate some of the quiet webmasters who might have something useful to contribute.
By the way, I agree with you that the answer is there somewhere, but we need to be systematic and exclude things one by one thus narrowing down the possibilities and focusing on what remains. And I believe we can do it. Of course Google may decide to change its song once we start humming.
:)
merlin30,
You opened my eyes...NONE of the top results has a link to my site for the search phrase I am interested in. I think that may have a lot to do with the filter. But then if this is the case we will soon have a "boys club" of sorts that you can only get into by getting down on your hands and knees and begging those at the top. Somehow it doesn't fit with my perception of Google's intentions. And how did the top sites get to the top anyway? I have my own network but apparently none of them is in the results. In my case there is a language problem, as many of the sites linking to mine are in Italian.
superscript,
besides turning to traditional means of advertizing it might be worth a try to seek out better quality links along the lines of what merlin30 suggested and see if that helps get you back in the SERPs. In any case it's probably wise to take some of your eggs out of the Google basket and consider Inktomi and Yahoo for the new year.
Sid,
as borisbaloney points out in your post
context is now more important than frequency and densityfor CIRCA, that makes sense to me. It would probably be very useful to actively seek out links from and link to all sorts of associations for "tokens" or as we common folk prefer to say - words. What we need to do is actually start implementing these ideas and see if one of us gets back in the SERPs a month from now.
And uh...if you did go to school in a place called hope then that's a good sign.
Your post in Google SEO Long term is probably right on the money:
keywords being in the anchor text isn't going to help - its the idea within the anchor text
In any event I am thoroughly convinced that a filter is being applied for KW repetition or perhaps Keyphrase repetition.
I just checked out one of the top sites in the SERPs for the keyphrase where my site is no longer listed and THEY too do not have any backlinks from any of the others in the top 100 results. So this confirms my suspiJust_Guessing
Google is now presenting lots of ideas to the user, in an effort to match the context of the results with the idea(s) within the users head. Googles results are now showing abstractions of the ideas within the search phrase which is why they look irrelevant when viewed with your particular focus.and
Your website can no longer rise to the top on specific keyword relevancy alone.
and
BUT here is the real difference - the keywords being in the anchor text isn't going to help - its the idea within the anchor text, which probably includes ideas of where the link originates as well.
Most reasonable explanation I have seen thus far. Good job.
seasalt
If the algorithm is deciding what the real content of each page is from semantic analysis, it is doing a good job with educational searches but missing the mark on some commercial pages--possibly due to heavy use of outdated optimization techniques making the desired keyphrase look like a company name or inorganic spam to the parser. To me this is the most sensible explanation suggested yet.