Page is a not externally linkable
httpwebwitch - 6:18 am on Jul 26, 2008 (gmt 0)
As for the spying, I suspected Google of doing this with their toolbar a couple of years ago, but I never found evidence. My reasoning was highly conspiratorial, in seven points: 1) it's possible If my suspicions are correct, Microsoft has that IE browser doing their spying and sending session behaviour data back to their data centers, which gives them vastly more reach than the limited # of people running the Googlebar. (And significantly higher adoption than Alexa, Stumble, and other toolbars) So where'd they get the data? same source, page 6: The data they use seems to consist of session requests, sort of like server log files. But if they are using IE to spy on people, they can get more than merely a log of HTTP requests. Once you start snooping in and recording people's browsing sessions, why stop there? Surely you'd glean interesting data from other browser behaviour, such as: 1) time spent with the browser window or tab focused need I go on?
I finished reading through it - all of it. The BrowseRank algorithm is a thing of beauty, and their methods are brilliant. This may not rock the world, but it may finally give Microsoft a pretty decent search engine.
2) collectively, they are very smart
3) a smart person would figure this out
4) it would make their SERPs more relevant
5) they would benefit from it
6) they have the means to do it
7) no one would know
page 5:
We used a user behavior dataset, collected from the World Wide
Web by a commercial search engine in the experiments. All possible
privacy information was rigorously filtered out and the data was
sampled and cleaned to remove bias as much as possible. There
are in total over 3-billion records, and among them there are 950-
million unique URLs.
we also obtained a large dataset from the same search
engine, containing 8000 queries and their associated webpages.
2) keystrokes per page
3) on-page interaction, like interaction with Flash or Media players
4) mouseovers, mouseouts, focuses and blurs
5) pages people put in their Favourites or Bookmarks
6) words people enter into forms
7) pages that are open simultaneously in tabs
8) sites that people tend to keep open in a tab all day
9) pages that do a lot of AJAX async requests
10) pages hiding behind authentication
11) names of people you know
12) your address, phone number, shoe size, bank account balance, sexual fantasy preferences...