Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: open
When I was university many moons ago, people were beginning to experiment with learning algos. I did not ever get into this, but I think I can still remember the basics - it goes something like this.
1) You have a complex system that cannot be easily analysed mathematically for optimum performance.
2) You have a number of parameters you can change on the system - we'll call them levers.
3) You choose a number of benchmark tests to decide if changes to the levers are good or bad. Typically, the number of benchmarks should not exceed the number of levers but that's another story.
4) You design an automated algo that randomly moves the levers (in small increments, one at a time) and measures the results. If there is an improvement, the change is kept otherwise it is discarded.
5) You seed the algo by choosing known sensible positions for the levers.
6) You hit the GO DO IT button.
Almost certainly, Google have such a program. However, I rather suspect that they don't fully understand the limitations.
1) Filters (i.e. system behaviour) must be smooth otherwise results between the benchmarks cannot be reasonably interpolated. In other words, the whole concept can become complete garbage.
2) If your benchmarks are poorly chosen, results will be poor.
3) When a system has many levers, you have to periodically throw in some wild changes, otherwise the program can become trapped in peaks or troughs.
In the past, I'm sure Google has done all this in private and the end results simply appeared at the monthly dance. However, maybe Google are so confident of their system that they now plan to go public with this. On the face of it, that might seem stupid, but if a decision has been taken to measure user interaction and base benchmarks on that, then they have no choice. The simplest measure that they might use is to track the number of results that a user clicks for a search, together with the position of those results in the SERPS.
You've given a name ("learning algos") to something that I've had a hunch about for little while now. My gut feeling is that they're already using learning algos on the public servers - or some of the public servers some of the time. I could be wrong though.
[edited by: bether2 at 4:59 pm (utc) on Mar. 18, 2004]
4 variables - you're looking at variable mass thermodynamics. Tricky
100 variables - you're looking at the Google algo.
You pull on one lever, and all the others move.
Experimentation on live animals (webmasters) really ought to be banned ::)
Yahoo on the other hand, does appear to track user behaviour from every search result page. Apparently though, if they are using learning algorithms, they haven't switched it to AUTOPILOT yet.
The same process you describe is what the SEO does to beat the Google algo.
If there was a program that did this work for us life would be easier
If Google is using learning algos, then in a near future there won't be difference between SEOs and web-designers: google algo will improve systematically and the only SEOing possible will be to do good webpages... isn't this the goal of any browser? Good work, G! ;)
I'm not worried because I work on improving the pages when I try to SEO them.
I bet you there are zillions more "bars" on the desktops of "joe six pack " than on the "internet pros"..
They are pitched at people who don't want to make the effort to search for themselves or even type in the "g" url ....
And they arent gonna take the time to find "g"'s email and complain about the crap results for another year or so ......
how long did they take to make the switch off from the all singing all dancing alta vista .......?
rest my case
If Google is using learning algos, then ...... google algo will improve systematically
Unfortunately, this ain't necessarily so. Learning algos can quickly achieve dramatic improvements in benchmark tests (or performance indicators) but the selection of those tests is crucial.
Also, it is abolutely critical to understand that if the system does not behave in a smooth and consistent manner, then interpolation (prediction of results between benchmark tests) may range from being unreliable to complete garbage.
This apparent over-optimisation filter is a case in point. A learning algo can still achieve good benchmark results in the presence of such filters (esp when adjusting the filter levers), unfortunately the benchmarks themselves can become worthless.
What you would end up with is a company convinced that it is providing better and better results (because the benchmarks say so) but a public that may, in large part, disagree, either because the benchmarks are not realistic or because behaviour between those benchmark tests is unpredictable - sometimes good, sometimes not so good.
Yeah, learning algos are a distinct possiblity. As you say there are huge issues such as local maxima/minima. In particular you say:
If there is an improvement, the change is kept otherwise it is discarded.
So how do Google define "improvement"? There's no accepted metric for judging SERPs. Is it better if the PhDs at the Plex like what they're seing? If we at WW like it? If the BBC website writes more articles praising Google?
Learning algos without a clearly defined and justified metric for judging effectiveness would be the sort of thing that a PhD with no common sense would suggest.
You're probably right.
Perhaps Google should fire their highly paid PhDs and employ a panel of real people to judge SERPs.
PS (OT) : I vote for Jonathan Pryce
So how do Google define "improvement"? There's no accepted metric for judging SERPs. Is it better if the PhDs at the Plex like what they're seing? If we at WW like it?
It is judged using the same criteria as manual algo changes - comparison between the machine generated results, and the hand selected order of results.
If Googles hand selected order of the top five results for a keyword are:
Site1, Site2, Site3, Site4, Site5
And the current algo produces:
Site2, Site1, Site5, Site4, Site3
Then a new algo that produced:
Site1, Site2, Site4, Site3, Site5
would be an improvement.
Naturally this is a waaaaaay to trivial example though. Algos would be tested with hundreds (probably thousands) of keywords and phrases, and a lot deeper than 5 sites. Click thoughs, repeat similiar searches by people, and many other metrics also determine the perceived quality of one algo over another.
It has been suggested that improvement could be measured by the time spent on a site. This is sent to google throught the google toolbar.
If a search has position 1 - average time spent = 15 seconds , position 2 average time spent = 20 seconds. Then an assumption can be made that position 2 was more relevant than position 1.
The problem with this is that webmasters will start building confusing navigation and surfer traps to make them spend more time on the site.
Nope. A learning algo would necessarily have to do so without hand selected results. A SE algo is an automated process.
I'm not saying that the algo changes with hand made adjustments, the learning algo just needs a large amount of "ideal" hand ordered series of results, to compare the results it generated too.
edited for clarification
And, what percent of Google users have the toolbar installed, and also have enabled the PR display? Some problems with that sample are:
#1) There is good reason to suspect that those people who have the toolbar installed, and also have enabled the PR display, are a very unrepresentative sample of the total population of Google users. Skewed toward webmasters, SEOs, computer geeks, etc. How many Grandmothers are checking PR of pages with the toolbar?
#2) The toolbar is only available for IE on Windows. What about Netscape, Opera, and Mozilla users? Or Mac or Linux users?
#3) At this moment I have no less than 7 instances of IE open on this box. Is that small page about the Cuban Missile Crisis I have had open for the last 3 hours, and ignored since then because I was using other IE instances, really *that* important? Those who use multiple open instances of IE will confound your way of measuring.
#4) Let's say I spent 5 minutes on the first Cuban Missile Crisis page I looked at, and not finding what I wanted clicked the back button on my browser, went to another page and spent just 15 seconds there because I immediately found what I was looking for. Your way of measuring is rewarding bad content, and not good content.
I could go on...
This does not allow for people keeping an interesting page open while they continue to search, but you can't have everything.
Is that small page about the Cuban Missile Crisis I have had open for the last 3 hours, and ignored since then because I was using other IE instances, really *that* important?
That's good thinking, but the toolbar could identify that the window was not active. If the toolbar were spying like this, it would simply count the active time of a Window. This is very easy.
Why does everyone assume that the only people who use the toolbar are seo experts ( or those of us who would like to be ...the real ones are making 10M per year elswhere just for saying they are experts )...?
I've got a small sideline business clearing up minor security issues for people on their home computers ..I can tell you ..and did earlier in this thread ...
The vast majority of tool bars I've seen are on the desktops of your average surfing Joe /Jane ..not with us geeks and serp junkies ...
Who BTW do figure in my customers ..amazing how many "geeks" think *rton is a safe way to surf....
Back to Joe and Jane
They aren't worried about Pr ..they're just sold on the idea that you can search without having to do angthing other than type the word and hit go!
Makes as much sense as saying only nerds got flat screen...was true for about 6 months and now every bodys got one even if they got to stand behind each other in a straight line to see an image on the thing ...Best way to make families buy amonitor for each member that I ever saw guys ...neato!
Or only execs got laptops and so on ...
Of course the bar is integrated into the deal ...
Might even be another explanation other than the *mazon is "g"s buddy and the ultra high ranking of any "directory" in the serps ....
The pages of the damn things are so long and scrolly ..you got to stay there for ages scrolling up and down to find what you want ...and each time you want to compare you have to actively scroll back up etc ..
To any analysis system you would appear to be very very interested in such a site ..cos your being so active on the page controls while you're there ....and in the "directories" that simply list the dmoz you have to keep hitting "back" to go from one to the other..
So to any toolbar "watching" you look positively addicted to the page in question ...
The problem is that it appears unable to distinguish between cursor movement due to frustration and due to satisfaction and interest ...
Read your stats .......95% plus of all arrivals are flavours of IE .........why should they care "g" about any other browsers ...The other browsers are the geek tools ..
As said ...any instance open which doesn't show active use ..scrolling , click , whatever in agiven time period is simply averaged out ....
Your Isp has been doing that for years while you were on dial up ....and cutting the contact when not in use ..
Windoze even has the checkbox to cut out after a timed period of inactivity on page ...
Those of us still on dial up ( doesn't come to where I live for another month yet ) ..know how we have to sit in front of the beast or use getright etc with "keep alive" during long downloads ...
OR they are anti-microsoft
OR they used something like Netscape from the get go and so they just keep using it.
95% is pretty high, but I would say that at least 75% are using I.E.
BTW ..my stats are actuall at 98% IE ....I kid you not!
Once in a while firefox or somesuch...but then I 'm not set up to attract geeks...
Posted this next elswhere here (equally ineptly )..someone please explain to me how to cut and paste thread string refs here ...
But there is a thread here..
Which confirms something I said about 2 days ago ( and I thought I was being heavily ironic at the time )..
But apparently "g" has just got around to placing the adwords "on top" of the serps .......
And next out of the box will be?
[edited by: Leosghost at 1:42 pm (utc) on Mar. 19, 2004]