homepage Welcome to WebmasterWorld Guest from 54.161.166.171
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 41 message thread spans 2 pages: 41 ( [1] 2 > >     
Cliff Top algo
What I think G did to their algorithm
ncgimaker




msg:727362
 12:10 pm on Apr 6, 2005 (gmt 0)

Here's what I think Google did to their algorithm with Florida and Allegra and why I think is a bad approach.

You are scored the same as before, by PR, link text, on page metrics & closeness of words.

If you score higher than a cut off point, then G automatically assumes you are spam because your score is too perfect. In effect you fall off the cliff top.

Florida rolled this out to a few of the most spammed keyphrases, Allegra rolls it out to the rest and moved the cliff face further inland.

It's not very good as an algorithm because if you search for memorable quotes the most definitive site with that quote is automatically flagged as spam if it ranks high enough. Perversly the more accurately you remember the quote the less chance of finding a major site!

You can demonstrate this to yourself by searching on exact product descriptions, the closer you are to the correct description the less chance you will find the site that stocks that exact product.

This is why a site thats cloned Bretts text and is PR2 ranks higher than Bretts actual post PR5, because his post is too perfect so it must be spam:

Search for In another post Google as a Black Box Giacomo proposed that we talk too much theory trying to find this post:

[webmasterworld.com...]

and notice that webmasterworld does not appear.

Now mix up the words and remove some to make it a less perfect search:

Giacomo post Google Box proposed theory Black and Bretts post comes up top again.

This is also why scraper sites & link pages are ranking above sites they scrape text from. I think when we complain to them, G re-ranks that phrase as an exception and is not convinced it is a bad algorithm.

Myself I think it needs work, at the very least restricting it to spammy result sets!

 

landmark




msg:727363
 3:13 pm on Apr 6, 2005 (gmt 0)

Sorry, but this is a crazy theory. In your example, Brett's page is ranked at #2. It's hardly fallen off a cliff.

I agree that Google has got it wrong ranking Brett's page (the original source of the material and a strong website) at #2, but your theory doesn't fit the results.

There are plenty of examples of Google showing good matches in top positions.

theBear




msg:727364
 3:36 pm on Apr 6, 2005 (gmt 0)

In another post Google as a Black Box Giacomo proposed that we talk too much theory

vs

"In another post Google as a Black Box Giacomo proposed that we talk too much theory"

Should return slightly different results.

Since one result set would be for all of the words in the exact sequence in the search string and the other for the collection of pages displaying any of the words in the search string and or inbound link text.

Likewise if you slightly change the search string.

With both results sets all smashed and crashed by the algol.

WebmasterWorld shows #1 when the search string was done inside of quotes and #2 when it was done outside of quotes.

Of course given the current state of flux in observed SERPs it is hard to pin down just about anything.

Down this morning, up this afternoon. YMWV (your mileage will vary).

ncgimaker




msg:727365
 3:46 pm on Apr 6, 2005 (gmt 0)

In your example, Brett's page is ranked at #2. It's hardly fallen off a cliff.

When I do the search Webmasterworld appear nowhere. I can show you the screen shots if you like.

Try the technique I explained to you in my post. Choose an important quote on an important site, then search Google for that phrase. If you choose a sufficiently important site then that site will be kicked down the rankings in favor of less popular sites because it scores too well. The more perfect it scores the more it falls.

Now remove words and change the search order. Notice the site pops back to the top.

jezlinux




msg:727366
 4:01 pm on Apr 6, 2005 (gmt 0)

I'm getting 2nd without quotes, but *not listed* with quotes - unless I click on the omitted results link, in which case it then lists WW top with a link and sublink.

Omitting the top result from the reduced result list - dance quirk or bug?

ncgimaker




msg:727367
 4:12 pm on Apr 6, 2005 (gmt 0)

theBear,

Firstly please try with a different phrase. I reported that phrase yesterday and it is contaminated now as a test phrase, but also I have a different geographic weighting to you, you may be nearer to WebmasterWorlds servers than me.

I deliberately chose a very very long phrase to search for because it shows the problem.

In another post Google as a Black Box Giacomo proposed that we talk too much theory

I do not believe that this is a subjective search.

When I did this with Google, I got sites that *copied* Bretts post, clearly Bretts original should have outranked these yet it was nowwhere.

Arguing that there are other ways to find the page misses the point. The quoted search only works if you know the phrase *perfectly*, but in the real world you might recall 10-15 words but not a perfect extract.

I know that Google had the post because when I diffused the search query enough (removing minor words and changing the order of words) it eventually popped straight back into the results at the top slot.

ncgimaker




msg:727368
 5:13 pm on Apr 6, 2005 (gmt 0)

I have an easier way of showing you the effect:

See this typical Google page a PR9.

[google.com ]

Take a phrase from the text:

What's a Google?,

Google ranks 3 below a PR5 site in the results I get (your results may vary slightly).

Now add more words.

What's a Google Googol is the mathematical term for a 1 followed by 100 zeros. The term was coined by Milton Sirotta

Now I get Google ranked at 11!

The more words I add, the lower the Google page ranks!

Can anyone seriously argue that anything other than the PR9 Google page should be listed at number 1 for that search?

Note, if this doesn't work for you please choose a different high rank site. I am very restricted in the examples I can show you on WWW, but I see it right across the searches.

Spine




msg:727369
 5:28 pm on Apr 6, 2005 (gmt 0)

I think their algo should be in special-ed, it seems to be the most stupid, spam friendly algo of theirs ever.

Wizard




msg:727370
 7:24 pm on Apr 6, 2005 (gmt 0)

What's a Google Googol is the mathematical term for a 1 followed by 100 zeros. The term was coined by Milton Sirotta

Can anyone seriously argue that anything other than the PR9 Google page should be listed at number 1 for that search?

Yes, I can.

How many words in this phrase is strictly related to Google?

One.

"Google".

Other are either completly popular words, like "what", "is", "term", "by", "the", or words related to mathematics.

landmark




msg:727371
 7:39 pm on Apr 6, 2005 (gmt 0)

I think that the problem is that these long search terms contain many stop words (a, in, is, the, ...) and so the word order becomes unimportant and any occurences of the words on the page or in backlinks count towards the page's score. That's why the results become less predictable the longer the search phrase - you are no longer searching for the phrase, just an unordered collection of words.

Search phrases with stop words aren't typical searches.

Instead, search for a phrase without stop words like "Webmaster World Search Conference Sponsored" and WebmasterWorld shows up at the top, where it should.

ncgimaker




msg:727372
 7:53 pm on Apr 6, 2005 (gmt 0)

Wizard:
Yes, I can.

Please name the site that you think should come top for that phrase then. I would be interested in seeing your choice and reasoning.
My choice and reasoning are very simple. Its a PR9 page, it was ranked 3 for a few words, and when I add more words in the exact order they appear on the page I would expect it more accurately describe the page.

landmark:
I thought it was stop words too, but the first test I did with the Webmasterworld search was to remove the stop words. WWW didn't become top until I swapped the word order aswell and even then I had to swap two sets of major words.

ncgimaker




msg:727373
 2:23 pm on Apr 9, 2005 (gmt 0)

More details, possible duplicate content algo?

I have found an example where a major site has a catch phrase in almost all of its pages in the META Description tag. Perhaps the SEO of that site is here?

If you search for 7 words (all nouns) with a site command, Google says it doesn't find any matches. Yahoo says 27000

In the following W1, W2 = Word1, Word2...

W1 site:domain.com
11300 results

W1 W2 site:domain.com
21 results

W1 W2 W3 site:domain.com
10 results

W1 W2 W3 W4 site:domain.com
2 results

W1 W2 W3 W4 W5 site:domain.com
1 result

W1 W2 W3 W4 W5 W6 site:domain.com
1 result

Now here's where it gets interesting. The single result is the only one of those 27000 pages *without* Word 7 on it! (Remember that in reality those 7 words are on almost all the pages)

In Yahoo
W1 W2 W3 W4 W5 W6 -W7 site:domain.com
Returns just the same single page result.

Now when I search those 7 words *without* the site command, i.e.
W1 W2 W3 W4 W5 W6 W7

I get link pages and spam directories to this major site and the site itself is nowhere to be seen! Google ranks spam scrapers and link pages above it because the spam is about many subjects and this site is all about
W1 W2 W3 W4 W5 W6 W7

I think this also explains many of the disappeared company names when they are also on most of the pages a site gets penalised for those words too.

And it also explains why I can't find definitive sites. If I search for widgets and the most important widget sites talk too much about widgets, I find only secondary sites that mention widgets in passing. The major site is penalised.

BillyS




msg:727374
 3:45 pm on Apr 9, 2005 (gmt 0)

I completely agree with this theory. I think that you can actually be penalized in Google for being too perfect on a search term. I have actually seen this myself on my own site. I am constantly outranked by pages that have nothing to do with the real topic the searcher is looking for.

This is often very obvious when looking around the 20th or so search term – even for popular searches. There you often see pages that merely touch the topic, and the page is really about something else. I know the TOS here does not like it when you post exact examples, but I see this all the time in Google, less so in MSN and almost never in Yahoo.

While many results make sense, some pages have no right being there at all and many sites with better information are shown below these pages. In fact, if you use quotations, you often find higher quality results on Google then without the quotations. I know the difference between the two searches, but the overall quality of results is better with the quotations.

Personally, I think Google has outsmarted themselves on this one and they are now sacrificing search results for the sake of attempting to stop spam. Page Rank and Hilltop look good on paper and it makes for a nice academic paper, but in the real world this theory of “voting” for other pages is just too easy to game. Folks with good content will have trouble getting links from authority sites. Many of those sites could care less about linking – they live in their own little world. Spammers can easily create a network of links. This is why the theory falls apart.

Page Rank is being updated quarterly now (and at least a week late presently). The fact that Google has decided not to update the toolbar demonstrates they are devaluing it. They are now stuck between a rock and a hard place, they are trying to stop spam, but the results are suffering.

theBear




msg:727375
 4:06 pm on Apr 9, 2005 (gmt 0)

There is a very old saying in the computer field called GIGO.

Apply a fully automated system with several even minor errors in it to a pile of garbage and what pops out?

It really makes no difference if you look at one word or a 7 word search string.

Trying to understand what is going on in such a system is an exercise in futility.

I fix what I can after finding out what happened and continue from there.

wordy




msg:727376
 6:00 pm on Apr 9, 2005 (gmt 0)

Makes good sense to me.

There are a certain number of factors that the algorithm takes into account which SEOs have been able to identify and manipulate. So G apply a percentage to the algorithm. If a page has a percentage above a threshold set by G, it does not appear prominently - below and you are OK.

This might explain the "search term - aspokjh" where the algo is showing results below the threshold. OOP and sandbox could also be rationalised as G vary the percentage threshold.

[edited by: wordy at 6:40 pm (utc) on April 9, 2005]

Wizard




msg:727377
 6:13 pm on Apr 9, 2005 (gmt 0)

W1 W2 W3 W4 W5 site:domain.com
1 result

W1 W2 W3 W4 W5 W6 site:domain.com
1 result

Now here's where it gets interesting. The single result is the only one of those 27000 pages *without* Word 7 on it! (Remember that in reality those 7 words are on almost all the pages)

And what would happen if you queried with quotation, I mean "W1 W2 W3 W4 W5 ..." site:domain.com?

With quotes, you search for the phrase, and we should expect the site with exact phrase in top of SERPS. Without quotes, you search for sites that contain W1 or W2 or W3 etc.

But why number of results decreases?

Some time ago there was a thread about calculations made by a folk from France, who assumed that in such queries Google at first takes results for each word separately. If possible, instead of using the index with 8 billion pages, it uses smaller index. Then it crosses the sub-results to find the best matches for multi-word (but not phrasal) query. This way, many results are lost at the first stage of the process.

Your observations may prove it really works this way.

I think that you can actually be penalized in Google for being too perfect on a search term.

For particular factors, like too high keyword density, you certainly can be penalised. I also noticed situations, where pages with keyword in URL rank lower than pages without keyword in URL, if both have very high density.

But try reading the new patent - "Information retrieval based on historical data" - maybe on-page factors matter little comparing to these new, weird factors. Content growing rate, links growing rate, ano other like these.

Page Rank is being updated quarterly now (and at least a week late presently). The fact that Google has decided not to update the toolbar demonstrates they are devaluing it.

Or the contrary - the don't show the true PR in toolbar just because the real PR is important and showing it would help spammers as it used to?

wordy




msg:727378
 6:28 pm on Apr 9, 2005 (gmt 0)

Makes good sense to me.

There are a certain number of factors that the algorithm takes into account which SEOs have been able to identify and manipulate. So G apply a percentage to the algorithm. If a page has a percentage above a threshold set by G, it does not appear prominently - below and you are OK.

This might explain the "search term - aspokjh" where the algo is showing results below the threshold. OOP and sandbox could also be rationalised as G vary the percentage threshold.

Pls delete this dupe post - apologies

zgb999




msg:727379
 6:49 pm on Apr 9, 2005 (gmt 0)

"Page Rank is being updated quarterly now (and at least a week late presently). The fact that Google has decided not to update the toolbar demonstrates they are devaluing it."

How can you know it is updated quarterly? What Google uses for the ranking and what Google displays in the toolbar is not the same.

ncgimaker




msg:727380
 6:57 pm on Apr 9, 2005 (gmt 0)

Using my new found understanding I will do reverse SEO and optimize a search phrase to find a page.

This one:

What's a Google Googol is the mathematical term for a 1 followed by 100 zeros The term was coined by Milton Sirotta

The Google page that I think should top for this phrase comes up number 10 when I use the above search.

In the following, the first number is how often Google.com uses that word, the second number is how many times it appears on the web.

What's 35000 357,000,000
Google 5670 269,000,000
Googol 83 80200
mathematical 22600 48,600,000
term 52600 226,000,000
for 81700 3,940,000,000
1 549000 3,400,000,000
followed 29400 79,300,000
by 81500 3,020,000,000
100 90100 391,000,000
zeros 142 3,020,000
the 81600 3,730,000,000
was 81700 817,000,000
coined 274 2,920,000
milton 6320 15,700,000
Sirotta 43 6690

Ok, so we want to remove words that don't help differentiate the page from the rest of the web, but may count against it in the duplicate penalty scoring.
i.e. where both numbers are relatively high, these ones look likely:

for 1 by 100 the was

This leaves the search:
What's Google Googol mathematical term followed zeros coined Milton Sirotta
Google now comes up 3rd, much better, but still not top.

So lets take out 'term' and see which way it goes.

What's Google Googol mathematical followed zeros coined Milton Sirotta
Now it comes up 2, better still.

Notice that there are still 54 results, 'term' didn't help reduce the number of pages, but its presence did cause the Google page to rank lower for a relevant word!

OK, lets try removing 'what's'
Google Googol mathematical followed zeros coined Milton Sirotta
Google page no longer found.

Yikes, too much we've gone over the cliff.

Google can't find Google, I think they should make a more information rich site and seek more links from quality sites and perhaps they will reappear for their keywords. :)

Wizard




msg:727381
 7:58 pm on Apr 9, 2005 (gmt 0)

I just tried with "Searching 8,058,044,651 web pages". Google New Zealand is #2. But when searching for 8,058,044,651 Google disappears from top results.

If Google is sandboxed itself, what chances do we have? ;))

One more thing:
Please name the site that you think should come top for that phrase then. I would be interested in seeing your choice and reasoning.

Do search for just 'googol' - you'll see exactly the results I was expecting.

ncgimaker




msg:727382
 10:33 am on Apr 10, 2005 (gmt 0)

Wizard:

And what would happen if you queried with quotation, I mean "W1 W2 W3 W4 W5 ..." site:domain.com?

There are filler words in the phrase in the meta tag but if I search for the phrase including the filler words:
"W1 W2 filler W3 W4 W5 filler W6 W7" site:domain.com

Yahoo finds 26900 pages Google finds none.

French people...who assumed that in such queries Google at first takes results for each word separately

Well yes, that's how I would do it too, but you have to assume they can get all results for any word if you ever hope to dig down into deep results.

It's not as difficult as scanning 3 billion entries for the word 'by' you can start with the most obscure words first. "Sirotta" only has 6690 results, ask the machine handling the next least popular word (Googol) to return the subset of its list of relevent pages that are in that 6690 set and at most it will give you a list of 6690 indices. Work your way up the list of words by popularity:

Ask 'Sirotta' machine to send its page index list (6690 entries) to 'Googol' machine to send its subset (4270) to 'coined' machine to send its subset (386)... now here's a thing we only have 386 results. If I was writing this I would change the method when I had a small enough set.
For example we only have 386 page indexes, you could ask the computer(s) holding the text of those 386 pages to return the score of each of those pages for the search phrase and do clever analysis at that point, even permitting pages that don't contain minor words from the search phrase.

Sort the results that come back then pull the text of the 10 relevant pages. Perhaps cache the rest of the results in case the person wants result 11-20, 21-30 etc.

I'm just thinking out loud here, my point is that mining 8 billion documents down to the very last document sounds complicated to a user, but seems trivial to a programmer. It's not that there is any physical limit that stops them mining it down to the very last document.

I'm pretty sure this algo isn't sustainable. You can't really have a search engine that can't find billions of 'detail' search phrases and can't rank what it can find properly.

As long as we keep prodding them. Google fixit fixit fixit fixit fixit fixit fixit fixit fixit fixit

Wizard




msg:727383
 6:58 pm on Apr 10, 2005 (gmt 0)

I'm just thinking out loud here, my point is that mining 8 billion documents down to the very last document sounds complicated to a user, but seems trivial to a programmer. It's not that there is any physical limit that stops them mining it down to the very last document.

I can think about the physical limit - selecting data from big databases is slower and requires more resources than selecting data from small databases. That's why Google probably uses two indexes, with limited amount of subresults in the commonly used one.

And that was a conclusion of the French, I still can't find the link to his article, maybe someone can help?

ncgimaker




msg:727384
 8:00 pm on Apr 10, 2005 (gmt 0)

I can think about the physical limit - selecting data from big databases is slower and requires more resources than selecting data from small databases. That's why Google probably uses two indexes, with limited amount of subresults in the commonly used one.

I strongly doubt they would ever use a generic DB to implement search. Even to return just the most common results or even to process a single word list.
A preloaded preindexed memory list would be a lot faster and simpler.

If the French lot concluded that, then they probably can't program for toffee.

car insurance




msg:727385
 9:22 pm on Apr 10, 2005 (gmt 0)

Can't we sum up the reaction to the latest Google changes by saying there are two camps of people:

1. People whose sites have either stayed the same or benefited from the recent changes.
2. People whose sites have gotten worse in the SERPs or even disappeared

If you're in Camp #1, you're probably not that concerned. Until the next Google update inexplicably puts you into Camp #2. Then like many others you'll be staring in disbelief at the PR0 site scrapers ranking in Google's SERPs while your site literally just doesn't exist.

Then you can listen to all the advice of "be patient - wait a year to see if it clears up". One day people will start to use a good search engine again and we'll all wonder how we put up with Google's garbage in the last year.

Wizard




msg:727386
 6:28 pm on Apr 11, 2005 (gmt 0)

I strongly doubt they would ever use a generic DB to implement search. Even to return just the most common results or even to process a single word list.
A preloaded preindexed memory list would be a lot faster and simpler.

And holding a dictionary based on 8 billion entries requires a nice amount of memory, which costs a nice amount of money.


Can't we sum up the reaction to the latest Google changes by saying there are two camps of people:

1. People whose sites have either stayed the same or benefited from the recent changes.
2. People whose sites have gotten worse in the SERPs or even disappeared

I'd say there is a third group - experienced SEOs, who have many different pages, some of them going up, other going down, always some of them earning money. They can afford cold analysis in order to have more pages up, but without unnecessary despair because some pages disappeared.

The algo changes for years. There was a time, long, long time ago, when you stuffes meta keywords tag with many keywords and were number one. Now we have more sophisticated methods, but from the point of google algo, many today SEO methods are appearing a bit outdated. So, obviously some pages have to go down.

ncgimaker




msg:727387
 7:24 pm on Apr 11, 2005 (gmt 0)

Wizard:
And holding a dictionary based on 8 billion entries requires a nice amount of memory, which costs a nice amount of money.

It would be an index of selecting search words rather than pages, but yes 1000 modern blade servers don't come cheap.

Wizard




msg:727388
 7:40 pm on Apr 11, 2005 (gmt 0)

It would be an index of selecting search words rather than pages

Absolutely, and I have seen databases where such an index consumed even more memory than actual content (for instance, phpbb forum scripts have search feature with words index taking twice much of space than posts text).

In first Google scheme, described by its founders, there are two indexes. Why would they abandon it? It's so efficient solution, even if it generates weird numbers of results for some queries.

reseller




msg:727389
 8:40 pm on Apr 11, 2005 (gmt 0)

Wizard

<I'd say there is a third group - experienced SEOs, who have many different pages, some of them going up, other going down, always some of them earning money.>

Allow me to do some loud thinking :-)

Part (maybe a great part) of allegra was targeting white hat SEOŽd sites and pages. Although white hat SEOs (WHSEOs) have always played nice with all major search engines and directories including Google, the bright engineers at Google have chosen to wage an undeclared war against them. Neither a wise nor a cleaver decision, IMHO. Instead of cooperation, it seems that Google is proceeding in the direction of confrontation with the WHSEOs.

Problem is that SEOŽd pages make more sense and logic for search engines and searchers alike. And once Google engineers have started writing algos to dismiss those SEOŽd pages from top positions on the serps, search results started their decline in quality as the several posts on these forums have already illustrated.

And lets take a look at how bright Google engineers are when it comes to writing those "smart" rotating algos. Have they taken in account the situation of pages with "Reversed SEO" variables? can their algos handle both WHSEOŽd pages and white hat reversed SEOŽd pages at the same time?

Wizard




msg:727390
 9:29 pm on Apr 11, 2005 (gmt 0)

Part (maybe a great part) of allegra was targeting white hat SEOŽd sites and pages. Although white hat SEOs (WHSEOs) have always played nice with all major search engines and directories including Google, the bright engineers at Google have chosen to wage an undeclared war against them.

And black-hats have advantage in some cases, because clearly it's them who use the strategy I described above. Not only them, of course, but black-hats wouldn't survive without this strategy.

Problem is that SEOŽd pages make more sense and logic for search engines and searchers alike. And once Google engineers have started writing algos to dismiss those SEOŽd pages from top positions on the serps, search results started their decline in quality...

But it's us, webmasters, who complain. Do searchers complain too? There are still many searches returning good results, it's mostly about commercial-related queries that return garbage.

And lets take a look at how bright Google engineers are when it comes to writing those "smart" rotating algos. Have they taken in account the situation of pages with "Reversed SEO" variables?

What if the algo is now based on completely new factors, for example some factors described in the new patent, recently discussed here?

What if content adding and links adding rates are more important now than static on-page factors? That would explain sandbox and many weird results. One can hardly keep stable, high rate of adding content forever. And only genuine links can be added in steadily growing rate.

How many white-hat SEO pages care about outbound links? It didn't use to be a factor, but now it's likely to be, as pages with keywords in outgoing anchor text on links to pages with high density of these keywords happen to link unexpectedly high for these keywords.

[edited by: Wizard at 9:29 pm (utc) on April 11, 2005]

Atticus




msg:727391
 9:29 pm on Apr 11, 2005 (gmt 0)

reseller,

"Problem is that SEOŽd pages make more sense and logic for search engines and searchers alike. And once Google engineers have started writing algos to dismiss those SEOŽd pages from top positions on the serps, search results started their decline in quality as the several posts on these forums have already illustrated."

That statement sums up the situation perfectly. Somebody should print that on a really big sign and march up and down in front of the Gplex until they stand up and take notice.

This 41 message thread spans 2 pages: 41 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved