| 9:13 pm on Dec 16, 2003 (gmt 0)|
<The layer of complexity that was added to the algo will be cracked and exploited only by the very best and determined spammers.>
checking my top 10 competitors I see in their frontpage a lot of keyword related terms – in fact related products/services, so a solution could to be REAL CONTENT
IMO anchor text is still very important – see “Miserable Failure”
Searching for kyword1 keyword2 I found #2 optimized for synonym1 keyword2
and google cache provide the following details :
"These search terms have been highlighted : keyword 2
These terms only appear in links pointing to this page: synonym1"
| 9:18 pm on Dec 16, 2003 (gmt 0)|
So are we seeing a complete semantic algo or a certain amount of weight for the algo is semantic? The amount of weight may be what is causing the odd results. They may have not found the sweet spot yet.
| 9:29 pm on Dec 16, 2003 (gmt 0)|
|IMO anchor text is still very important – see “Miserable Failure” |
This new algo definitely has two approaches.
1. If term is in list run new algo.
2. If term is not in list keep old algo including massive skewing towards anchor text.
<assumption>miserable failure is not in the terms database therefore the old algo including reliance on anchor text operates.
| 9:33 pm on Dec 16, 2003 (gmt 0)|
I have the greatest respect for your posts but your assumption is wrong.
Highly competitive term #1 in serps...(This term only appears in links pointing to this page:)
NB: Was in top 5 pre-florida but not #1
[edited by: tantalus at 9:38 pm (utc) on Dec. 16, 2003]
| 9:37 pm on Dec 16, 2003 (gmt 0)|
Several pages (6 out of 83) which were missing since Florida from the SERPs for that highly technical search phrase have returned as of a couple of hours ago.
They are now in the 50s and 60s rather than page one but at least they are not completely missing from the top 100 results as before. That still leaves 76 of the most technically competent pages about that technology nowhere to be found.
There do not appear to have been any changes to several of those pages in the past month and several of the others change radically evey day. The trick seems to be to call the technology something which no consumer would ever call it. Use a term which only the technology experts use.
It is kind of like if you search for Vitamin B6 and get no results but if you search for Pyridoxine Hydrochloride you get all the information you expect to get about Vitamin B6.
I can't make a pattern out of it other than that right now but will keep watching.
| 9:40 pm on Dec 16, 2003 (gmt 0)|
|Discerning intended meaning from a few keywords is not possible in the general case, nor shall it ever be in the future. And if you can't discern the user's intent, discerning the meaning of web pages/sites is almost certainly a complete waste of time. |
Hmmm...I remember another event that had an association with Florida and involved attempts at divining intent...
| 9:49 pm on Dec 16, 2003 (gmt 0)|
Yes, the term is in list run new algo ( travel related )
| 9:52 pm on Dec 16, 2003 (gmt 0)|
>I remember another event that had an association with Florida and involved attempts at divining intent...
lol... Results powered by Miss Cleo
I see odwords in your future..
[edited by: john316 at 9:52 pm (utc) on Dec. 16, 2003]
| 9:52 pm on Dec 16, 2003 (gmt 0)|
Here is my complaint & answer back from Google. I thought I would give you guys my 2 cents worth, My site prior to florida update was ranked between 1st and 6th position (page 1 ) for the last several years (as well on Yahoo), I've decided this is purely a business decision of some sort, anyway here is the full text of my letter to google and there reply.
Original Message Follows:
[edited by: WebGuerrilla at 10:55 pm (utc) on Dec. 16, 2003]
[edit reason] Sorry, no email quotes [/edit]
| 10:24 pm on Dec 16, 2003 (gmt 0)|
That e-mail you posted from Google, Teleconnect, is nothing more than a standard form letter.
| 10:46 pm on Dec 16, 2003 (gmt 0)|
Thats what I thought as well....Oh well there is always alltheweb.com their index is uncorrupted by this new "florida update garbage results"
| 10:52 pm on Dec 16, 2003 (gmt 0)|
Concerning the Google letter...that just shows you that they did not look at my problem nor really give a **** about this new index update!
| 11:54 pm on Dec 16, 2003 (gmt 0)|
|So are we seeing a complete semantic algo or a certain amount of weight for the algo is semantic? |
A strong point - if the algo has an Applied Semantic element - it can only be 'flavoured' by it. Otherwise there would be dramatic movement in purely scientific / informational sites. I've seen little movement of these sites. Hence the many posters on these boards with non-commercial sites hardly noticing that a major algo change had occurred.
In my mind, at least, the jury is still out on the Applied Semantics idea - either it has been mixed in very lightly, or we're back to the old 'commercial filter' idea.
| 2:15 am on Dec 17, 2003 (gmt 0)|
>Concerning the Google letter...that just shows you that they did not look at my problem nor really give a **** about this new index update!
Why assume otherwise?
| 4:11 am on Dec 17, 2003 (gmt 0)|
Have any of you theoreticians considered building a plain vanilla test Web site with no SEO, and then vary one variable at a time to see how Google's algorithm behaves? I realize there are lots of combinations and permatations, but it might provide some direction.
| 4:45 am on Dec 17, 2003 (gmt 0)|
|Concerning the Google letter...that just shows you that they did not look at my problem nor really give a **** about this new index update! |
That's awfully narcissistic.
| 9:50 am on Dec 17, 2003 (gmt 0)|
|A strong point - if the algo has an Applied Semantic element - it can only be 'flavoured' by it. Otherwise there would be dramatic movement in purely scientific / informational sites. I've seen little movement of these sites. Hence the many posters on these boards with non-commercial sites hardly noticing that a major algo change had occurred. |
I've read the paper on CIRCA and can see how it would be able to interpret the gist of a document and pigeon hole it as "that's about whatever". But I'm having trouble understanding how it could say "that's more about whatever than this other document." The following is what I think might be the simple explanation. Please feel free to tear it apart with vitreolic postings.
Assuming a technology closely related to what is described in the CIRCA paper is what is now being used and reassessing my analysis of pages that are at the top of SERPs. I would say that the new algo seems to "like" pages that are about the broad context of the subject searched for and which contain the actual search term to only a small extent. In other words it seems to be more sure that something is about blue widgets if it is also about blue widgetting, green widgets and the study of advanced widgetology. This may change over time but at the moment putting blue widget in context seems to be most important based on my very small sample.
Then it seems to rank the pages by PageRank (or backlinks but there is a strong correlation between the two). So CIRCA decides what a document is about and PageRank (or backlinks) decides how important it is in relation to other pages that are pigeon holed as being about that thing.
CIRCA may also have some input in ranking in this way. It may look at the pages that are linked to by the page in question and if they are about the thing in question then this helps the strengthen the context of the page being ranked. In other words a page is more about a thing if it is linked to pages that are about related subjects to that thing. It may be that being related to is better than being the same.
Someone mentioned how it might filter out pages that are too much about a particular topic as being spam. I'm going to read back to see if I can find the post that refers to it but if anyone has more info on this please share it.
On page 5 of the CIRCA paper they give the example of "bears witness", suggesting there are two meanings of this expression. Of course there are at least three. My favorite is "Bears witness picnic ground incident, Booboo and Yogi seen running into the woods." ;)
| 10:18 am on Dec 17, 2003 (gmt 0)|
|Highly competitive term #1 in serps...(This term only appears in links pointing to this page:) |
NB: Was in top 5 pre-florida but not #1
What you say here could be very interesting.
Is that page on the broad subject that it is appearing at #1 for?
Staying with the assumption that CIRCA is the silver bullet. Pure speculation but, perhaps an element of what CIRCA does to pigeon hole a page is to look at inbound link anchor text. If that text uses the exact term and the page pointed to is about the broad context then perhaps it is seeing that page as a good match for the broad context of what is being searched for.
If you want to stikie me details I would be pleased to take a look.
| 10:26 am on Dec 17, 2003 (gmt 0)|
|it might filter out pages that are too much about a particular topic as being spam |
Sid, I am convinced that regardless of whether or not CIRCA technology is being implemented in the algorithm there is surely a filter applied.
I say this because the sites I am following which were previously at the top of SERPs are no longer in the first 1000 results for very specific searches containing EVIL KEYWORDS. One might argue that applied semantics is simply presenting other more relevant sites but I have looked at sites way down on the list and they have very very little to do with what mine offers. In my bread and butter site I have sections dedicated to my town and various services all in english, where as the only relevant sites similar to mine are in Italian - but they are not focused on the search phrases I am chasing.
What I'd like to know is what other webmasters have been able to dertermine about this filter?
Surely others reading these posts have been affected by a filter, where your website no longer appears at all and way down at number 999 you have a site which has little or nothing to do with your business.
Come on all you lurkers and experienced webmasters, let's get some feedback here and see if we can collectively come up with some possible reasons that trigger the filter and what course of action to take.
| 11:13 am on Dec 17, 2003 (gmt 0)|
|Come on all you lurkers and experienced webmasters, let's get some feedback here and see if we can collectively come up with some possible reasons that trigger the filter and what course of action to take. |
Bobby I say this in the nicest possible way - what do you think everyone here has been doing for the last 6000 posts since Nov 15?
You have some reading to do if you think this question (or the one about the "filter") hasn't been asked. BTW I do think the answer is there amongst them.
| 11:14 am on Dec 17, 2003 (gmt 0)|
For the searches you are following (and have gone AWOL for) can your site be reached by following links from any of the top listed results?
| 11:19 am on Dec 17, 2003 (gmt 0)|
|I say this because the sites I am following which were previously at the top of SERPs are no longer in the first 1000 results for very specific searches containing EVIL KEYWORDS. |
Did the pages that were previously there contain evil keywords?
It could be argued that if CIRCA is used then it would look at the broad context of those evil keywords. If they or a token (word=token=word=token why did they change it from word I wonder?) within the search has some more inocent context perhaps that is given "strength" and evil contexts given less strangth.
Non evil example: car tires
If a site uses a high frequency and density of the phrase car tires then it is not as about car tires as one that talks about radial, pirelli, goodyear, tread pattern, car tires, auto tires, tractor tires, wheels etc etc.
Keeping with the CIRCA assumption context is now more important than frequency and density. Plus links to pages on subjects in that broad context and high PR and or backlinks (including context related or exact anchor text) may be important.
The point is that you can describe a process, knowing what we know about CIRCA, that would produce the result that you are seeing without there being a separate filter.
It may be that a weakness has been shown in CIRCA that has never previously been seen. Lets say that previously only a few million documents have been assessed. Now with 3.6 billion to work on anomalies which were never previously seen are being thrown up. The CIRCA database has "over two million unique terms". I would postulate that in order to be processed in the CIRCA technology the search entered has to match one of those 2 million terms exactly. This is why we are seeing what we think are very close terms either processed or not for no apparent logical reason in our eyes. This is also why searching for blue widget +the gives a different result from blue widget and also explains why they were able to fix the -nonsense thing.
I'm hoping that new terms are only added to the Ontology if there is a predetermined level of usage of the term which makes it worth adding. I'm also hoping that "Word Sense Disambiguation" will soon determine that, what is a brand name in the US, is a generic in the UK and realise that 99.99% of the searches are in the generic context not brand related context.
OK so I'm being unrealistic. But then I did go to school in a place called Hope.
| 11:22 am on Dec 17, 2003 (gmt 0)|
I too suspect that Applied Semantics aside, there is still something that resembles a filter in place. My non-commercial astrophysics site has been SEO'd in much the same way as my commerce sites; yet whilst it remains very high in the listings, the commercial ones are suddenly nowhere.
As for what to do about it - it may sound a bit defeatist, but I'm not sure anything can be done. I'm
1. Hoping that Google will eventually soften its commercial results
2. Investing heavily in magazine advertising, in the hope of throwing off the Google yoke forever
3. NOT using Adwords ;)
| 11:30 am on Dec 17, 2003 (gmt 0)|
Great post Hissingsid. One of the most insightful I've read in ages - especially in relation to the difficulty of the subject matter.
I believe the methodology behind your car tires optimisation is "the next anchor text".
| 12:03 pm on Dec 17, 2003 (gmt 0)|
I posted the following message pretty early on in Florida - it seems more appropriate to this thread:
[webmasterworld.com ] (Goto Message 289)
| 12:08 pm on Dec 17, 2003 (gmt 0)|
I think if CIRCA technology is being used at Google, a very basic explanation of how it works could be:
1. Analyse user's imput (the tokens-words) and extract the meaning.
2. Return relevant SERPs for that meaning (no for the tokens-words)
So, if I search first for "kid", and after for "child", I will get very similar SERPs, because the meaning of these tokens-words is very similar. But what I'm getting for "kid" and "child" is completely different SERPS.
| 12:08 pm on Dec 17, 2003 (gmt 0)|
Please do not feel offended by my call to lurkers and other webmasters who have not posted specifics, I appreciate your posts as well as all the others who are pointedly trying to narrow things down here. I simply wanted to stimulate some of the quiet webmasters who might have something useful to contribute.
By the way, I agree with you that the answer is there somewhere, but we need to be systematic and exclude things one by one thus narrowing down the possibilities and focusing on what remains. And I believe we can do it. Of course Google may decide to change its song once we start humming.
You opened my eyes...NONE of the top results has a link to my site for the search phrase I am interested in. I think that may have a lot to do with the filter. But then if this is the case we will soon have a "boys club" of sorts that you can only get into by getting down on your hands and knees and begging those at the top. Somehow it doesn't fit with my perception of Google's intentions. And how did the top sites get to the top anyway? I have my own network but apparently none of them is in the results. In my case there is a language problem, as many of the sites linking to mine are in Italian.
besides turning to traditional means of advertizing it might be worth a try to seek out better quality links along the lines of what merlin30 suggested and see if that helps get you back in the SERPs. In any case it's probably wise to take some of your eggs out of the Google basket and consider Inktomi and Yahoo for the new year.
as borisbaloney points out in your post
for CIRCA, that makes sense to me. It would probably be very useful to actively seek out links from and link to all sorts of associations for "tokens" or as we common folk prefer to say - words. What we need to do is actually start implementing these ideas and see if one of us gets back in the SERPs a month from now.
|context is now more important than frequency and density |
And uh...if you did go to school in a place called hope then that's a good sign.
| 12:27 pm on Dec 17, 2003 (gmt 0)|
Your post in Google SEO Long term is probably right on the money:
|keywords being in the anchor text isn't going to help - its the idea within the anchor text |
IF that is the new criteria for the algorithm then we know how to proceed.
In any event I am thoroughly convinced that a filter is being applied for KW repetition or perhaps Keyphrase repetition.
I just checked out one of the top sites in the SERPs for the keyphrase where my site is no longer listed and THEY too do not have any backlinks from any of the others in the top 100 results. So this confirms my suspiJust_Guessing
| 3:33 pm on Dec 17, 2003 (gmt 0)|
|Google is now presenting lots of ideas to the user, in an effort to match the context of the results with the idea(s) within the users head. Googles results are now showing abstractions of the ideas within the search phrase which is why they look irrelevant when viewed with your particular focus. |
Your website can no longer rise to the top on specific keyword relevancy alone.
BUT here is the real difference - the keywords being in the anchor text isn't going to help - its the idea within the anchor text, which probably includes ideas of where the link originates as well.
Most reasonable explanation I have seen thus far. Good job.
| 3:34 pm on Dec 17, 2003 (gmt 0)|
Hissingsid, I like your latest idea. It *does* match what I've seen in the educational searches. Many are interpreting our satisfaction with these searches to mean that non-commercial/informational searches weren't affected, but that's not true. The searches were affected--sites that are actually about a topic are coming up ahead of irrelevant spam sites with the search phrase listed 20 times on it and sites about a business that merely has the search phrase as part of its company name. People who are searching for "trilobite" may want to learn about trilobites, see a fossil hunter's trilobite collection, or buy trilobites online, but it's unlikely they're interested in Trilobite Ted's used cars, much less a porn page with "trilobite trilobite trilobite" pasted all over it. (This is a completely hypothetical example, as should probably be obvious. :-D But I've seen exactly these scenarios with real keywords.)
If the algorithm is deciding what the real content of each page is from semantic analysis, it is doing a good job with educational searches but missing the mark on some commercial pages--possibly due to heavy use of outdated optimization techniques making the desired keyphrase look like a company name or inorganic spam to the parser. To me this is the most sensible explanation suggested yet.
| 3:43 pm on Dec 17, 2003 (gmt 0)|
Good post, Hissingsid.
Where the algo is often going wrong is when it breaks up the search term and looks for other terms related to individual words within the search term.
Add in the discounting of anchor text in internal links, and it explains much of what I have seen.
I think the Adwords Keyword Suggestion Tool gives a good insight into what terms Google thinks are related, but you need to look at the suggestions for each individual word in the search term, not just the suggestions for the whole search term.
Looking at the search results is a bit like listening to a small precocious child using language it doesn't quite understand!
| This 260 message thread spans 9 pages: < < 260 ( 1 2 3 4  6 7 8 9 ) > > |