homepage Welcome to WebmasterWorld Guest from 54.243.12.156
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 35 message thread spans 2 pages: 35 ( [1] 2 > >     
Learning Algos
Are they behind Google's recent behaviour?
kaled




msg:81037
 12:28 pm on Mar 18, 2004 (gmt 0)

It has been suggested that Google will henceforth be making small algo changes several times a month in future. Why would they do this - well here's the likely explanation.

When I was university many moons ago, people were beginning to experiment with learning algos. I did not ever get into this, but I think I can still remember the basics - it goes something like this.

1) You have a complex system that cannot be easily analysed mathematically for optimum performance.
2) You have a number of parameters you can change on the system - we'll call them levers.
3) You choose a number of benchmark tests to decide if changes to the levers are good or bad. Typically, the number of benchmarks should not exceed the number of levers but that's another story.
4) You design an automated algo that randomly moves the levers (in small increments, one at a time) and measures the results. If there is an improvement, the change is kept otherwise it is discarded.
5) You seed the algo by choosing known sensible positions for the levers.
6) You hit the GO DO IT button.

Almost certainly, Google have such a program. However, I rather suspect that they don't fully understand the limitations.

1) Filters (i.e. system behaviour) must be smooth otherwise results between the benchmarks cannot be reasonably interpolated. In other words, the whole concept can become complete garbage.
2) If your benchmarks are poorly chosen, results will be poor.
3) When a system has many levers, you have to periodically throw in some wild changes, otherwise the program can become trapped in peaks or troughs.

In the past, I'm sure Google has done all this in private and the end results simply appeared at the monthly dance. However, maybe Google are so confident of their system that they now plan to go public with this. On the face of it, that might seem stupid, but if a decision has been taken to measure user interaction and base benchmarks on that, then they have no choice. The simplest measure that they might use is to track the number of results that a user clicks for a search, together with the position of those results in the SERPS.

Kaled.

 

allanp73




msg:81038
 2:32 pm on Mar 18, 2004 (gmt 0)

Interesting...
The same process you describe is what the SEO does to beat the Google algo.
If there was a program that this work for us life would be easier ;)

bether2




msg:81039
 2:40 pm on Mar 18, 2004 (gmt 0)

Kaled,

You've given a name ("learning algos") to something that I've had a hunch about for little while now. My gut feeling is that they're already using learning algos on the public servers - or some of the public servers some of the time. I could be wrong though.

Beth

[edited by: bether2 at 4:59 pm (utc) on Mar. 18, 2004]

Leosghost




msg:81040
 2:49 pm on Mar 18, 2004 (gmt 0)

Probably the post rational explanation for the "g" bar ....

Problem is the net result of this kind of "feed back" is either "junk" ...

Or "dumbing down"........

What have we been seeing .........?

SyntheticUpper




msg:81041
 2:54 pm on Mar 18, 2004 (gmt 0)

3 variables - you're looking at fixed mass thermodynamics. Hard

4 variables - you're looking at variable mass thermodynamics. Tricky

100 variables - you're looking at the Google algo.

Messy :)

You pull on one lever, and all the others move.

Experimentation on live animals (webmasters) really ought to be banned ::)

hutcheson




msg:81042
 4:34 pm on Mar 18, 2004 (gmt 0)

>Experimentation on live animals (webmasters) really ought to be banned ::)

You know, you really ought not to give straight lines like that. It's like bleating on bridges.

JustheFacts




msg:81043
 5:12 pm on Mar 18, 2004 (gmt 0)

Love the concept Kaled, and do believe this is where all engines have to go in the future, although I have trouble buying into the theory that its occuring right now. The metrics required to gauge and continually improve this type of learning algorithm require careful tracking of user behaviour ... after all it is user satisfaction that they need to infer. However, we currently see Google doing very little tracking of this sort, albeit they do some. It also can't be argued that Google would use their toolbar for this type of tracking, since the results would be heavily scewed to internet marketing professionals.

Yahoo on the other hand, does appear to track user behaviour from every search result page. Apparently though, if they are using learning algorithms, they haven't switched it to AUTOPILOT yet.

Robert123




msg:81044
 5:14 pm on Mar 18, 2004 (gmt 0)

This is how the separation of the first hydrogen atom/bomb was accomplished--levers and small changes.

Herenvardo




msg:81045
 5:28 pm on Mar 18, 2004 (gmt 0)

Interesting...
The same process you describe is what the SEO does to beat the Google algo.
If there was a program that did this work for us life would be easier

There is no great difficulty to make such a program... the problem is that you need to test a change before applying or discharding it, and you can not get an immediate answer: depending on the change, you can even have to wait a month.
Doing the work manually, you can make the changes more strategically and get more accuracy in less time.

If Google is using learning algos, then in a near future there won't be difference between SEOs and web-designers: google algo will improve systematically and the only SEOing possible will be to do good webpages... isn't this the goal of any browser? Good work, G! ;)
I'm not worried because I work on improving the pages when I try to SEO them.

Greetings,
Herenvardö

Leosghost




msg:81046
 5:32 pm on Mar 18, 2004 (gmt 0)

<It also can't be argued that Google would use their toolbar for this type of tracking, since the results would be heavily scewed to internet marketing professionals.>

I bet you there are zillions more "bars" on the desktops of "joe six pack " than on the "internet pros"..

They are pitched at people who don't want to make the effort to search for themselves or even type in the "g" url ....

And they arent gonna take the time to find "g"'s email and complain about the crap results for another year or so ......

how long did they take to make the switch off from the all singing all dancing alta vista .......?

rest my case

Ruben




msg:81047
 6:12 pm on Mar 18, 2004 (gmt 0)

It is impossible to make something like this automated pilot for seo's. It would take about 1 or 3 months before you can pull one lever and measure the stuff. Or you should try to get 100 identical sites indexed in google and for each site you should change a different lever and then measure what happens. And finally you should put all these measurements together.. I think you can add all measurements linearly, because the whole google algorhytm must be linearly, otherwise it technically is impossible to make a quick searchengine.

kaled




msg:81048
 6:21 pm on Mar 18, 2004 (gmt 0)

If Google is using learning algos, then ...... google algo will improve systematically

Unfortunately, this ain't necessarily so. Learning algos can quickly achieve dramatic improvements in benchmark tests (or performance indicators) but the selection of those tests is crucial.

Also, it is abolutely critical to understand that if the system does not behave in a smooth and consistent manner, then interpolation (prediction of results between benchmark tests) may range from being unreliable to complete garbage.

This apparent over-optimisation filter is a case in point. A learning algo can still achieve good benchmark results in the presence of such filters (esp when adjusting the filter levers), unfortunately the benchmarks themselves can become worthless.

What you would end up with is a company convinced that it is providing better and better results (because the benchmarks say so) but a public that may, in large part, disagree, either because the benchmarks are not realistic or because behaviour between those benchmark tests is unpredictable - sometimes good, sometimes not so good.

Kaled.

SyntheticUpper




msg:81049
 7:55 pm on Mar 18, 2004 (gmt 0)

It's like bleating on bridges.

What the hell does that mean :)

dzazi




msg:81050
 10:03 pm on Mar 18, 2004 (gmt 0)

troll bait?

SyntheticUpper




msg:81051
 10:53 pm on Mar 18, 2004 (gmt 0)

This is how the separation of the first hydrogen atom/bomb was accomplished--levers and small changes.

And I thought I went off topic, what a load of utter pants.

valeyard




msg:81052
 12:53 am on Mar 19, 2004 (gmt 0)

kaled,

Yeah, learning algos are a distinct possiblity. As you say there are huge issues such as local maxima/minima. In particular you say:

If there is an improvement, the change is kept otherwise it is discarded.

So how do Google define "improvement"? There's no accepted metric for judging SERPs. Is it better if the PhDs at the Plex like what they're seing? If we at WW like it? If the BBC website writes more articles praising Google?

Learning algos without a clearly defined and justified metric for judging effectiveness would be the sort of thing that a PhD with no common sense would suggest.

You're probably right.

Perhaps Google should fire their highly paid PhDs and employ a panel of real people to judge SERPs.

PS (OT) : I vote for Jonathan Pryce

borisbaloney




msg:81053
 1:32 am on Mar 19, 2004 (gmt 0)

So how do Google define "improvement"? There's no accepted metric for judging SERPs. Is it better if the PhDs at the Plex like what they're seing? If we at WW like it?

It is judged using the same criteria as manual algo changes - comparison between the machine generated results, and the hand selected order of results.

Eg.
If Googles hand selected order of the top five results for a keyword are:
Site1, Site2, Site3, Site4, Site5

And the current algo produces:
Site2, Site1, Site5, Site4, Site3

Then a new algo that produced:
Site1, Site2, Site4, Site3, Site5
would be an improvement.

Naturally this is a waaaaaay to trivial example though. Algos would be tested with hundreds (probably thousands) of keywords and phrases, and a lot deeper than 5 sites. Click thoughs, repeat similiar searches by people, and many other metrics also determine the perceived quality of one algo over another.

trimmer80




msg:81054
 1:55 am on Mar 19, 2004 (gmt 0)

>>>>So how do Google define "improvement"? There's no accepted metric for judging SERPs. Is it better if the PhDs at the Plex like what they're seing? If we at WW like it?

It has been suggested that improvement could be measured by the time spent on a site. This is sent to google throught the google toolbar.
If a search has position 1 - average time spent = 15 seconds , position 2 average time spent = 20 seconds. Then an assumption can be made that position 2 was more relevant than position 1.

The problem with this is that webmasters will start building confusing navigation and surfer traps to make them spend more time on the site.

rfgdxm1




msg:81055
 1:56 am on Mar 19, 2004 (gmt 0)

>It is judged using the same criteria as manual algo changes - comparison between the machine generated results, and the hand selected order of results.

Nope. A learning algo would necessarily have to do so without hand selected results. A SE algo is an automated process.

borisbaloney




msg:81056
 2:01 am on Mar 19, 2004 (gmt 0)

Nope. A learning algo would necessarily have to do so without hand selected results. A SE algo is an automated process.

I'm not saying that the algo changes with hand made adjustments, the learning algo just needs a large amount of "ideal" hand ordered series of results, to compare the results it generated too.

edited for clarification

rfgdxm1




msg:81057
 2:22 am on Mar 19, 2004 (gmt 0)

>It has been suggested that improvement could be measured by the time spent on a site. This is sent to google throught the google toolbar.
If a search has position 1 - average time spent = 15 seconds , position 2 average time spent = 20 seconds. Then an assumption can be made that position 2 was more relevant than position 1.

And, what percent of Google users have the toolbar installed, and also have enabled the PR display? Some problems with that sample are:

#1) There is good reason to suspect that those people who have the toolbar installed, and also have enabled the PR display, are a very unrepresentative sample of the total population of Google users. Skewed toward webmasters, SEOs, computer geeks, etc. How many Grandmothers are checking PR of pages with the toolbar?

#2) The toolbar is only available for IE on Windows. What about Netscape, Opera, and Mozilla users? Or Mac or Linux users?

#3) At this moment I have no less than 7 instances of IE open on this box. Is that small page about the Cuban Missile Crisis I have had open for the last 3 hours, and ignored since then because I was using other IE instances, really *that* important? Those who use multiple open instances of IE will confound your way of measuring.

#4) Let's say I spent 5 minutes on the first Cuban Missile Crisis page I looked at, and not finding what I wanted clicked the back button on my browser, went to another page and spent just 15 seconds there because I immediately found what I was looking for. Your way of measuring is rewarding bad content, and not good content.

I could go on...

kaled




msg:81058
 2:54 am on Mar 19, 2004 (gmt 0)

I don't think analysis of user satisfaction is likely to be anything like as complex as people are suggesting. They will more likely just see how many times results are clicked (and the position in the SERPS of those clicks). As a general rule of thumb, people stop searching when they find what they are looking for.

This does not allow for people keeping an interesting page open while they continue to search, but you can't have everything.

Is that small page about the Cuban Missile Crisis I have had open for the last 3 hours, and ignored since then because I was using other IE instances, really *that* important?

That's good thinking, but the toolbar could identify that the window was not active. If the toolbar were spying like this, it would simply count the active time of a Window. This is very easy.

Kaled.

newwebster




msg:81059
 3:37 am on Mar 19, 2004 (gmt 0)

Has anyone checked out www.mooter.com?
I think that they are working on a self learning search engine based on user intent by frequency of searches performed through clusters of terms. I am sure it has grabbed the attention of Google.

Leosghost




msg:81060
 12:20 pm on Mar 19, 2004 (gmt 0)

#1) There is good reason to suspect that those people who have the toolbar installed, and also have enabled the PR display, are a very unrepresentative sample of the total population of Google users. Skewed toward webmasters, SEOs, computer geeks, etc. How many Grandmothers are checking PR of pages with the toolbar?

rfgdxm1>>>>>>>

Why does everyone assume that the only people who use the toolbar are seo experts ( or those of us who would like to be ...the real ones are making 10M per year elswhere just for saying they are experts )...?

I've got a small sideline business clearing up minor security issues for people on their home computers ..I can tell you ..and did earlier in this thread ...
The vast majority of tool bars I've seen are on the desktops of your average surfing Joe /Jane ..not with us geeks and serp junkies ...

Who BTW do figure in my customers ..amazing how many "geeks" think *rton is a safe way to surf....

Back to Joe and Jane
They aren't worried about Pr ..they're just sold on the idea that you can search without having to do angthing other than type the word and hit go!

Makes as much sense as saying only nerds got flat screen...was true for about 6 months and now every bodys got one even if they got to stand behind each other in a straight line to see an image on the thing ...Best way to make families buy amonitor for each member that I ever saw guys ...neato!

Or only execs got laptops and so on ...

Of course the bar is integrated into the deal ...

Might even be another explanation other than the *mazon is "g"s buddy and the ultra high ranking of any "directory" in the serps ....

The pages of the damn things are so long and scrolly ..you got to stay there for ages scrolling up and down to find what you want ...and each time you want to compare you have to actively scroll back up etc ..
To any analysis system you would appear to be very very interested in such a site ..cos your being so active on the page controls while you're there ....and in the "directories" that simply list the dmoz you have to keep hitting "back" to go from one to the other..
So to any toolbar "watching" you look positively addicted to the page in question ...

The problem is that it appears unable to distinguish between cursor movement due to frustration and due to satisfaction and interest ...

Leosghost




msg:81061
 12:23 pm on Mar 19, 2004 (gmt 0)

#2) The toolbar is only available for IE on Windows. What about Netscape, Opera, and Mozilla users? Or Mac or Linux users? >>>>>>>>

Read your stats .......95% plus of all arrivals are flavours of IE .........why should they care "g" about any other browsers ...The other browsers are the geek tools ..

Leosghost




msg:81062
 12:29 pm on Mar 19, 2004 (gmt 0)

#3) At this moment I have no less than 7 instances of IE open on this box. Is that small page about the Cuban Missile Crisis I have had >>>>>

As said ...any instance open which doesn't show active use ..scrolling , click , whatever in agiven time period is simply averaged out ....

Your Isp has been doing that for years while you were on dial up ....and cutting the contact when not in use ..

Windoze even has the checkbox to cut out after a timed period of inactivity on page ...

Those of us still on dial up ( doesn't come to where I live for another month yet ) ..know how we have to sit in front of the beast or use getright etc with "keep alive" during long downloads ...

Leosghost




msg:81063
 12:30 pm on Mar 19, 2004 (gmt 0)

I could ( and did ..sorry there! ) go on ....

Ledfish




msg:81064
 12:40 pm on Mar 19, 2004 (gmt 0)

"Read your stats .......95% plus of all arrivals are flavours of IE .........why should they care "g" about any other browsers ...The other browsers are the geek tools .."

OR they are anti-microsoft

OR they used something like Netscape from the get go and so they just keep using it.

95% is pretty high, but I would say that at least 75% are using I.E.

Patrick Taylor




msg:81065
 12:49 pm on Mar 19, 2004 (gmt 0)

I see a big room full of moving levers and spinning dials and strange noises - but there's no-one there. I see another room where everyone is dressed in white coats, holding a long meeting to decide what to do next.

Leosghost




msg:81066
 1:33 pm on Mar 19, 2004 (gmt 0)

Can you skateboard in a white coat....?

BTW ..my stats are actuall at 98% IE ....I kid you not!

Once in a while firefox or somesuch...but then I 'm not set up to attract geeks...

Posted this next elswhere here (equally ineptly )..someone please explain to me how to cut and paste thread string refs here ...

But there is a thread here..
[webmasterworld.com...]

Which confirms something I said about 2 days ago ( and I thought I was being heavily ironic at the time )..

But apparently "g" has just got around to placing the adwords "on top" of the serps .......

And next out of the box will be?

[edited by: Leosghost at 1:42 pm (utc) on Mar. 19, 2004]

This 35 message thread spans 2 pages: 35 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved