| This 41 message thread spans 2 pages: < < 41 ( 1  ) || |
|Cliff Top algo|
What I think G did to their algorithm
| 12:10 pm on Apr 6, 2005 (gmt 0)|
Here's what I think Google did to their algorithm with Florida and Allegra and why I think is a bad approach.
You are scored the same as before, by PR, link text, on page metrics & closeness of words.
If you score higher than a cut off point, then G automatically assumes you are spam because your score is too perfect. In effect you fall off the cliff top.
Florida rolled this out to a few of the most spammed keyphrases, Allegra rolls it out to the rest and moved the cliff face further inland.
It's not very good as an algorithm because if you search for memorable quotes the most definitive site with that quote is automatically flagged as spam if it ranks high enough. Perversly the more accurately you remember the quote the less chance of finding a major site!
You can demonstrate this to yourself by searching on exact product descriptions, the closer you are to the correct description the less chance you will find the site that stocks that exact product.
This is why a site thats cloned Bretts text and is PR2 ranks higher than Bretts actual post PR5, because his post is too perfect so it must be spam:
Search for In another post Google as a Black Box Giacomo proposed that we talk too much theory trying to find this post:
and notice that webmasterworld does not appear.
Now mix up the words and remove some to make it a less perfect search:
Giacomo post Google Box proposed theory Black and Bretts post comes up top again.
This is also why scraper sites & link pages are ranking above sites they scrape text from. I think when we complain to them, G re-ranks that phrase as an exception and is not convinced it is a bad algorithm.
Myself I think it needs work, at the very least restricting it to spammy result sets!
| 10:22 pm on Apr 11, 2005 (gmt 0)|
|Problem is that SEOīd pages make more sense and logic for search engines and searchers alike. |
Did they go after WHSEO or just all optimized pages B&W and simply decided that anything that appear to be too close a match to the search phrase must be spam?
I search for one of my blog entries:
keyword keyword firstname lastname sitename
In Google not found. Yahoo 2nd.
keyword keyword firstname lastname site:sitedomain
In Google not found. Yahoo top.
I finally found that I could cause Google to find it with:
keyword firstname site:sitedomain
So I have to soften the query to get the correct post to show up.
It's like Florida did for 2 and 3 words phrase but now on everything. In Florida you would search for
and instead you would get pages targetting:
fluffy yellow widgets
fluffy stylish widgets
Now you search for
and get pages about 'fluffy pillows filler filler filler filler wearing a steel tungsten widget'
Finally Google lives up to its title as a search engine, and not a find engine. A sort of Google Zen joke on us all.
| 3:43 pm on Apr 12, 2005 (gmt 0)|
Another day playing with it and I think I know what causes the wild oscillations in serps others are calling the rotating algo:
Assuming there is a duplication penalty, that they analyse a site with a Bayesian filter and they flag words that appear too often as spam words and define a penalty based on the probably result from the bayesian filter on those spam words.
The Bayesian filter is the ratio of the probability of a word being spam to the probability of finding the word in regular text, all multiplied by a constant.
The bayesian penalty pushes sites down the rankings, the ranking algorithm pushes sites up, so the highest scoring entries on a search term are all balanced on the cliff edge between the two scores.
Freshbot data comes in, in previous times fresh data might push some results down a little. Now it also changes the scoring of the Bayesian filter for all the entries too.
Then fresh data comes in, it changes the Bayesian scoring on all entries and some sites fall over the edge while others pop out of nowhere.
What you are seeing as rotation algo could just be fresh data coming into this algorithm.
That would explain a lot. Google has just done GMail, they use Bayesian filtering in that to detect spam.
But Bayesian filters don't detect spam, they detect repetition! It's because the spammer repeats himself that he trips a bayesian filter. The filter just detects a side effect of the spammer.
If they applied the same technique to the web they would not be detecting spam, they would be detecting repetition and so authoritative sites would disappear too, Google would be penalised for using the words "Google" and "googol" too much, and so on. Exactly whats happening in the result!
This could explain a lot. Just thinking out loud.
| 2:30 am on Apr 14, 2005 (gmt 0)|
|...they... flag words that appear too often as spam words and define a penalty |
I believe this is true, based on my own experience.
However, I also believe that the penalty is not permanent. It relaxes with time (the sandbox effect?).
In my case, the software name was composed from very common words. I believe this was the reason why, during several months, searching my software name in quotes, I found only the download sites that referenced to me. Now my site is on top for my product name with and without the quotes.
Bottom line, if you target for highly competitive words, you have to wait. It seems that Google assumes that the spammers will not wait long.
| 8:41 am on Apr 14, 2005 (gmt 0)|
Is it natural that even small changes in Gís algorithms lead to dramatic changes in SERPs?
It is impossible that quality page which for a long time appears at the top of SERPs could be found irrelevant with a small modification of indexing algorithm.
Such a situation means that Googleís algo is not robust.
Permanent changes of indexing algorithm during several years should hide the fact that G cannot provide stable results with its page ranking algorithm. Any truly commercial system cannot be so unstable.
Should we believe in the quality of Gís search results?
| 9:04 am on Apr 14, 2005 (gmt 0)|
|Search for In another post Google as a Black Box Giacomo proposed that we talk too much theory trying to find this post: |
and notice that webmasterworld does not appear.
AGE is more important. Which is exactly why this very thread now out ranks the original for that phrase.
| 9:20 am on Apr 14, 2005 (gmt 0)|
>>>>>No, AGE is more important. Which is exactly why this very thread now out ranks the original for that phrase.
Probably true - but couldn't that also be because this thread is only three levels deep from webmasterworld homepage at the moment (eg Homepage>Google News>This Thread) while the older thread is four levels (eg Homepage>Google News>This Thread>Older Thread)
| 9:58 am on Apr 14, 2005 (gmt 0)|
|I believe this is true, based on my own experience. However, I also believe that the penalty is not permanent. It relaxes with time (the sandbox effect?). In my case, the software name was composed from very common words. |
I think thats a tweak. We complained about our company name (composed of common words), link pages to us were there we were vanished.
Within a day of the complaint that result had been re-ranked to scrape together an acceptable result (minus us). Basically we are keyword1 keyword2, they found a few sites and the rest filled with keyword2 keyword1.
Within a few days (after a crawl?) we were back for that phrase . So I think that when we complain they tweak the weightings on just that phrase to clean it up and they probably did the same for your company name.
| 10:25 am on Apr 14, 2005 (gmt 0)|
|AGE is more important. Which is exactly why this very thread now out ranks the original for that phrase. |
Brett, it was not in the listings when I searched. Not just that you were outranked by other copies of that post, but you were outranked by page after page of randomly scattered words on the page. (Your post was not findable from where I was).
The blog post I mentioned I couldn't find above was a fairly recent post. This showed the same effect, if the query matched the result too closely Google could not find it.
I don't think 'age' explains away those missing post (or the many others I am watching currently).
Brett, you would know more than I,do you think Google with try to stick with this algo? I can understand their wish - to tackle doorway page SEO via a duplicate content detection algo. Probably a Bayesian one, judging from the W1 W2 W3 W4 W5 W6 W7 query mentioned above.
But I'm a big G fanboy and would hate to see Google go the same way as 3DFX.
| 10:53 am on Apr 14, 2005 (gmt 0)|
From what I gather up till now (analysing my logs as well as reading many posts here) and from the look of the SERPs lately I think that what we are seeing is simply a major glitch (which probably started around mid Feb).
I doubt anyone will design or tweak a search engine to return bad results or will ignore pages that match the search query perfectly and instead serve remotely related pages on purpose and just for the sake of reducing spam and force people to click ads. Nope.it aiínt it.
This is a major glitch and indicate (to me anyway) a sudden lack of crunching power at the plex. Never before it has taken so long for an update to settle down. Too much data and not enough processing power (lots of storing space though). Google has grown too much, too fast and unfortunately it comes at a cost.
It looks like the plex is somewhat crippled due to this problem and the results are all over the place with lots of good old content sites disappearing from the serps. Not to mention erratic spidering, erratic referrals traffic patterns dramatically changing almost on an hourly basis as the data centres bounce/balance the load due to the whole system working at half the capacity it should have worked (and i see the erratic patterns on all 7 sites I have as well as 4 sites of clients of mine - up down up down).
I havenít changed anything, I believe that the algo is still working itís way through the new index (almost 8 billion pages now). They had about half of this figure indexed just before this mess all started.
The only question is when are they going to get it fixed.... seams like it is taking forever.
My 2 cents...
[edited by: max_mm at 11:05 am (utc) on April 14, 2005]
| 10:57 am on Apr 14, 2005 (gmt 0)|
"Bottom line, if you target for highly competitive words, you have to wait. It seems that Google assumes that the spammers will not wait long."
Vadim, nice. Isn't this kind of what the patent doc talks about? age, history, etc. Our primary product page was optimized a long time ago, WHSEO style, with good text and no tricks. I have made only minor changes over time and it went from the 4th page to first place over the last year. We're above our BHSEO competitors and loving it. Do the right thing and if Google is as good as they claim/strive to be then you should be rewarded. Otherwise bye bye G.
So many good points in this article, one of the best threads on WWW i have ever read. Thanks everyone.
| 11:41 am on Apr 14, 2005 (gmt 0)|
|I doubt anyone will design or tweak a search engine to return bad results or will ignore pages that match the search query perfectly and instead serve remotely related pages on purpose and just for the sake of reducing spam and force people to click ads. |
I don't think that either. I think that the people in the Gplex convinced themselves that all search is subjective. They noticed that the spammiest sites are at the top of the rankings, so they designed algorithms to strip those off. They then looked at the most common searches and patted themselves on the back for a successful result and rolled it out in response to the launch of MSN Search.
GG could confirm this. Are G happy with the result or not?
To me the world is different. There are 3 types of searches.
1. I know what I want, but I don't know where it is. Searching for Blog posts and Bretts post are a typical example of this. This is an objective search Google either finds it or not.
2. I know the class of thing I want and I know how to find it. This is the subjective expert search. I know that a web page is a conversation between the writer and reader, and that I have to phrase the query as though the writer was talking to me. So I would never search for "widgets for my car" because no web page would use that phrase (use of first person possesive 'my'), or "widgets for mam" (use of regional slang).
3. I am Joe sixpack and I type stuff in Google and it magically returns relevant results and so I think Google is God.
This is really SEO at work, they make sure that there are pages that score for "widgets for my car" & "widgets for mam" which Googles algo serves up. These people attribute that magic to Google, it is not Google it is SEO.
Spam isn't SEO, spam is deceptive misleading SEO. If the algo doesn't differentiate between the two types it will make searches of type 3 very weak. If as a side effect it also removed definitive sites on the subject it will fail searches of type 1.
My 2 cents.
| This 41 message thread spans 2 pages: < < 41 ( 1  ) |