First off, 'algorithm' is not the appropriate word. Basically, the ordering of results is based on a set of data (some 200+ factors). The bots go out and collect the raw data and this undergoes some basic calculations to gve the 200 factors.
The ordering is then dependent on how each factor is weighted.
So what of 'machine learning'? The common conception is that it is simply noted which results are selected, which cause a quick return to the results page with another selection made, and a few other similar possibilities. But what to do with this feedback? This information just tells about individual sites, though it could be used by storing this additional 'in the field' data as an extra factor (or factors).
But perhaps there is more. One speculation is that the order could be randomly jiggled a bit to 'try out' lower ranked sites at a higher position. This would again be a feedback on individual sites (more accurately pages).
What I would do is this. Test out hundreds of different 'algos' to see which was best. Best could be determined by noting how often the user selected a higher result and how 'successful' it was - the user not returning quickly to select another (in these kind of ways).
So how to run hundreds of different algos?
Well, because the order depends only on the factor weightings, this can be modified real-time and different weighting sets used, data collected and new weighting sets created, somewhat randomised but favoring the direction of change (lower or higher for factor x or y or z etc) depending on what worked better in the previous testing period. This is natural selection; over time it would discover the best weighting for each factor (for the current factor set; because of interdependence in many cases these relative weightings would change when additional factors are added). Create say a thousand new weighting sets with random 'mutations' (subtle, say +/- 20% of a factor's weighting, either one or several co-mutations). Take the best performers after have sufficient (statistically significant) data, and create new mutations of these. Rinse and repeat, keeping a league table of best ever performers.
This is not trying out different pages in different positions - it is something much better - trying out different algos; visitors to Google will also, imperceptibly be scoring algos (weighting sets) a 'scalable solution' :)
Since I heard mention of a 'breakthrough' at Google I have been trying to think what it could be. If G does not currently do this, I commend it to do so. The tweaking is done by every user and would naturally tend to the actual ideal.