Organic Cyberspace: Microsoft Researcher Looks at Online Communities
"Who can you trust?"

 4:02 pm on Jul 15, 2004 (gmt 0)

Marc Smith, a researcher at MSR, is in the midst of a large project aimed at studying and quantifying the chaotic system that is the Usenet. His project, called Netscan, measures and maps social cyberspaces. So far, he feels that he's just peeled back the skin, but hasn't completely reached the core.

The article goes on to explain the project's goals and examination of the behavior of people in online communities.

Smith hopes his tools will help participants in online communities evaluate one another and help build trust among its members.




 12:16 am on Jul 16, 2004 (gmt 0)

The thinking is a little fuzzy behind the project.

I can't remember the name of the principle, but the theory is that by observing something, you change the nature of what is observed.

As soon as anyone tries to implement this, two things are going to happen:

"Honest" people will become aware that their writings in these threads are being monitored, and will change the way they post, eith consciously or subconsciously. As it stands now, the responses aren't quantified. As soon as they become quantified, I would expect people to become much more formal in their approach to the monitored discussions, which will skew any intial algorythm heavily.

"DisHonest" people, or "Honest GamePlayers", will try and study the algorythm, and place their messages according to best beneift themselves within the algorythm's formula. This, in and of itself, will wipe out the validity of the algorythm.

The research will be valid up until the point it is published and becomes disseminated even among a small group. After that point, people's patterns will change and the research will only be valid in a historical context.


 6:14 pm on Jul 16, 2004 (gmt 0)

Interesting article, Marcia. You just don't find many statistical forum studies, thanks for posting!

I suspect Grelmar has a point about patterns changing should this technique become widely used. It's kind of like SEO - as soon as an algorithm becomes popular, behavior starts changing to increase performance.

We see that all the time on the micro scale. Forum members may post frequently to boost their post count, for instance, since many forums rank their members by the number of posts. If the rating favored fewer long posts, no doubt we'd find reputation-grubbers getting longer-winded and less prolific. :)


 1:11 am on Jul 22, 2004 (gmt 0)

Ok, this post stuck like a grain of sand in my shoe that just wouldn't go away until I went back and gave it some deeper thought.

So I went back and re-read the article, and thought about it some more. Then grabbed my GF to get her to read it, and give me an opinion just to make sure my train of thought wasn't going off the deep end.

And our basic conclusion was that this is an incredibly creepy, and wrongheaded way of looking at things. Here's a rundown of some of the major issues we both ended up taking on the article:

Cross Behaviour Posting

Depending on the topic, I tend to reply/participate in different ways. On some subjects I'll drop a quick single "reply" and move on, maybe checking in to see how other people responded. Sometimes, they respond en masse, sometimes they don't. Other times, I'll get fully involved in a discussion/debate, bouncing posts back and forth as the thread grows. According to the formula they use, this would make me both an "answer guy" (read: trustworthy), or a prolific questioner (read: untrustworthy). My rating would depend on which of my threads you sampled. Either rating would be essentially wrong. This could apply to a great many people.

The Sample the Researcher Selected is Too Specific

The MSN research is formulating his data based on Usenet postings. And that's a HUGE mistake. Even if I thought the formula he derived for the system was relevant (which I don't), it would be relevant only to Usenet users. Usenet has become a "fringe" of the net, and a very specific type of person participates in Usenet. Assuming that the population at large would act in a similar way in other forums is unjustified. Certain groups of people tend to Usenet, certain types tend to BBS, certain types Blog, some do all three, and the majority of the population participates in none of the above. The old McLuhan adage that "The Medium is the Message" tends to be quite true, anecdotally at least. Radio commentators act differently from TV commentators act differently from print journalists. Making a formula that quantifies a radio commentator's trustworthiness and then trying to apply the same formula to newspaper columnists would be wrong at face value. They're different professions, talking to different audiences, using different methods.

The Formula Would Eliminate Too Much Valuable Data

Any statistical model used to determine relevancy is essentially a negative response algorythm. It is used to weed out irellevant data. In many fields, this can prove quite helpful, such as in medical research where you can use it to determine anomolous responses to medications and treatments. It lets you select out the "freak" occurrances.

The problem is the greater the sample that a negative response algortyhm eliminates, the less valid it becomes. If such an algortyhm eliminates 50% of a statistical population, then there is obviously a very large segment that is simply unquantified, and you can't draw any real conclusions from the sample selected by the data.

This MSN research looks to be eliminating a far greater proportion of the data. They're trying to use it to base decisions on reading as few opinions as possible. At a guess, I'd say they were looking for less than 5 to 10 % of the sample data to take back for further review. That way, they only have to read a small amount of postings. This is eliminating 90% + of the population as irrelevant.

I doubt if I need to explain how wrongheaded that concept is.

That Microsoft is using this research to apply to their product development cycle, tells us more about MS's corporate mentality than it does about how people post in discussions.

It tells us that MS is losing touch with the "human factor" in product development, and wants to reduce customer response, and development issues related to customer response, to a statistical analyses of issues. They don't want to actually read through massive amounts of complaints and suggestions, they want to make an algorythm that tells them that "Response A is statiscally relevant, response B isn't, so we'll work on dealing with response A and ignore B."

Again, wrongheaded. It skews there development cycle towards appeasing a certain portion of the population that they have determined is statistically relevant, and ignores all else. If you respond in forums in certain ways, and are responded to in certain ways, then you're relevant. If you fall outside that narrow definition of relevancy, then obviously you aren't relevant and your opinon doesn't matter.

Good customer service is based on everyone's opinion being important. This doesn't mean you cater to everyone's opinion, it just means that you take as broad a sample of the population into consideration as possible.

MS doesn't want to deal with you as an individual. They want to eliminate 90% of our opinions as irrelevant, and work on the rest.


 9:54 pm on Jul 22, 2004 (gmt 0)

grel: I wasn't sure why the article gave me the ickies. You've hit the nail on the head though, I think....

