Forum Moderators: mack
Microsoft researchers and academic collaborators detailed an idea this week it calls BrowseRank that seeks to bring more of a human touch ... Essentially, the researchers tested out a system that replaces PageRanks' link graph --a mathematical model of the hyperlinked connections of the Internet --with what they call a user browsing graph that ranks Web pages by people's behavior.
Read CNET Story [news.cnet.com]
OK, an interesting idea. Now let's also get MSN/LIVE to properly crawl and index pages, then maybe we'll have something...
...................................
In our experiment, such datasource [research.microsoft.com]
was recorded and collected from an extremely large group of users
under legal agreements with them. Information which could be
used to recognize their identities was not included. By integrating
the data from hundreds of millions of web users, we can build a
user browsing graph
"under legal agreements with them"
... this makes me nervous.
Because I'm one of those people that skim over all the Terms and Conditions and click "I Agree" at the end. Was I unknowingly one of the users in that "extremely large group of users"?
It's not clear whether their "extremely large" group of users is the same as the "hundreds of millions of web users" mentioned in the following sentence. "Hundreds of millions" kind of implies "everyone", like maybe everyone using IE, Hotmail, Windows Live Messenger, etc.
I SMELL A SPYBOT AND I THINK ITS NAME IS INTERNET EXPLORER
I'd love to get some confirmation of this suspicion...
As for the spying, I suspected Google of doing this with their toolbar a couple of years ago, but I never found evidence. My reasoning was highly conspiratorial, in seven points:
1) it's possible
2) collectively, they are very smart
3) a smart person would figure this out
4) it would make their SERPs more relevant
5) they would benefit from it
6) they have the means to do it
7) no one would know
If my suspicions are correct, Microsoft has that IE browser doing their spying and sending session behaviour data back to their data centers, which gives them vastly more reach than the limited # of people running the Googlebar. (And significantly higher adoption than Alexa, Stumble, and other toolbars)
So where'd they get the data?
same source,
page 5:
We used a user behavior dataset, collected from the World Wide
Web by a commercial search engine in the experiments. All possible
privacy information was rigorously filtered out and the data was
sampled and cleaned to remove bias as much as possible. There
are in total over 3-billion records, and among them there are 950-
million unique URLs.
page 6:
we also obtained a large dataset from the same search
engine, containing 8000 queries and their associated webpages.
The data they use seems to consist of session requests, sort of like server log files. But if they are using IE to spy on people, they can get more than merely a log of HTTP requests. Once you start snooping in and recording people's browsing sessions, why stop there? Surely you'd glean interesting data from other browser behaviour, such as:
1) time spent with the browser window or tab focused
2) keystrokes per page
3) on-page interaction, like interaction with Flash or Media players
4) mouseovers, mouseouts, focuses and blurs
5) pages people put in their Favourites or Bookmarks
6) words people enter into forms
7) pages that are open simultaneously in tabs
8) sites that people tend to keep open in a tab all day
9) pages that do a lot of AJAX async requests
10) pages hiding behind authentication
11) names of people you know
12) your address, phone number, shoe size, bank account balance, sexual fantasy preferences...
need I go on?
The BrowseRank algorithm is a thing of beauty, and their methods are brilliant.
Agreed.
Human behaviour analysis in my opinion is the right way for SE's to go. Whilst SE's have some extremely bright people doing incredibly clever things with automated processes it just isn't possible to get close to real human behaviour.
Will be a very interesting one to watch.
I can see the privacy concerns, but I do find the technology fascinating at the same time. The engineer in me wants to see it in action, another part of me finds it all a little scary...
limited # of people running the Googlebar
...
Google Search History- Preferences- Accounts- Gmail cookies, Analytics, AdSense, DoubleClick data, +stats of MySpace, YouTube, AOL, whoknowswhatelse... might be a colorful patchwork but their set of user behavior data is world class ( #1 as of current ). They're just not using it ( yet / to full potential ) on organic search.
Haven't they been using it for AdWords quality and relevancy checks with success? Basically apart of phrase based filtering and regionalization that's all they do to rank ads: watch what users do, analyze, react ( unless your business model is unwanted your bid price, placement, quality scores... all depend on historic user behavior data ).
...
Of course the dataset is nowhere as large or as 'interesting' as if *everyone* using IE was contributing to the database.
I like that list above... would be interesting if *I* knew all this.
Not sure if I want MS to know *heh*...
either way, as long as MS keeps spying only on things they could learn /use legally 'if they owned every website on the net'... it's fine I guess. And for that purpose ... not sure about this but...
how much more data would they need apart of what their Phishing filters dial back home with...?
...
And why is that? Because human behavior may not -- in and of itself -- be the best gauge as to what is best. Two old quotes come to mind:
"No one ever went broke underestimating the intelligence of the American people." ~ H. L. Mencken
"People can easily be persuaded to accept the most inferior ideas or useless products." ~ Bartleby
We'll see how it plays out. Given how poorly MSN/LIVE has performed so far, they HAVE to do something, and this seems at first glance like a positive step in the right direction.
............................
And why is that? Because human behavior may not -- in and of itself -- be the best gauge as to what is best.
I 100% agree with that comment, Reno. This could never be used as the only method to power a top search engine. New content is a good example - how do you get that to fit in to the results? Just because no-ones been there yet, doesn't mean its not good.
Microsoft may be onto a good thing, and I'm looking forward to seeing if this does deliver better results, but they still need to improve their general relevancy and linguistic intelligence to compete with Google.
Regardless, BrowseRank sounds like a nice refinement as it would implicitly include measures of visibility/prominence, pertinence and attractiveness of links, besides just their existence.
It's also something that other "would be" search engines would have trouble emulating, because not many organisations would be able to get enough link-walking data to do it.
I like it. Quite a lot.
Dixon.