Monday, October 29, 2012

Avoiding the Tyranny of the Majority in Collaborative Filtering



One of the more annoying aspects of the modern Internet is crap comments.  For instance, it's improved in recent years, but for a while the typical comments on Youtube music videos were among the most idiotic examples of human "thought" and behavior I've ever seen…

A common solution to the problem is to have readers rate comments.  Then comments that are highly-rated by readers get ranked near the top of the list, and comments that are panned by readers get ranked near the bottom of the list.  This mechanism is used to good effect on general-purpose sites like Reddit, and specialized-community sites like Less Wrong.

Obviously this mechanism is very similar to the one used on Slashdot and Digg and other such sites, for collaborative rating of news items, web pages, and so forth.

There are many refinements of the methodology.  For instance, if an individual tends to make highly-rated comments, one can have the rating algorithm give extra weight to their ratings of others' comments.

Such algorithms are interesting and effective, but have some shortcomings as well, one of which is a tendency toward "dictatorship of the majority."  For instance, if you have a content that's loved by a certain 20% of readers but hated by the other 80%, it will get badly down-voted.

I started wondering recently whether this problem could be interestingly solved via an appropriate application of basic graph theory and machine learning.

That is, suppose one is given: A pool of texts (e.g. comments on some topic), and a set of ratings for each text, and information on the ratings made by each rater across a variety of texts.

Then, one can analyze this data to discover *clusters of raters* and *networks of raters*.

A cluster of raters is a set of folks who tend to rate things roughly the same way.   Clusters might be defined in a context-specific way -- e.g. one could have a set of raters who form a cluster in the context of music video comments, determined via only looking at music video comments and ignoring all other texts.

A network of raters is a set of folks who tend to rate each others' texts highly, or who tend to write texts that are replies to each others' texts.

Given information on the clusters and networks of raters present in a community, one can then rank texts using this information.  One can rank a text highly if some reasonably definite cluster or network of raters tends to rank it highly.

This method would remove the "dictatorship of the majority" problem, and result in texts being highly rated if any "meaningful subgroup" of people liked it.  

Novel methods of browsing content also pop to mind here.  For instance: instead of just a ranked list of texts, one could show a set of tabs, each giving a ranked list of texts according to some meaningful subgroup.

Similar ideas could also be applied to the results of a search engine.  In this case, the role of "ratings of text X" would be played by links from other websites to site X.   The PageRank formula gives highest rank to sites that are linked to by other sites (with highest weight given to links from other sites with high PageRank, using a recursive algorithm).  Other graph centrality formulas work similarly.  As an alternative to this approach, one could give high rank to a site if there is some meaningful subgroup of other sites that links to it (where a meaningful subgroup is defined as a cluster of sites that link to similar pages, or a cluster of sites with similar content according to natural language analysis, or a network of richly inter-linking sites).   Instead of a single list of search results, one could give a set of tabs of results, each tab listing the results ranked according to a certain (automatically discovered) meaningful subgroup.

There are many ways to tune and extend this kind of methodology.   After writing the above, a moment's Googling found a couple papers on related topics, such as:

http://iswc2004.semanticweb.org/demos/01/paper.pdf

http://www.citeulike.org/user/abellogin/article/2200728

But it doesn't seem that anyone has rolled out these sorts of ideas into the Web at large, which is unfortunate….

But the Web is famously fast-advancing, so there's reason to be optimistic about the future.  Some sort of technology like I've described here, deployed on a mass scale, is going to be important for the development of the Internet and its associated human community into an increasingly powerful "global brain" …

12 comments:

  1. What I find most interesting is how to explain the differences among the sub-groups opinions.

    Is there any software that addresses this issue among experts?

    What would be a good place to look? At the singularity summit this year, there was a presentation that compared the different branches of science in terms of respect and authority of prediction. The lowest rated category was AGI predictions.

    How do you resolve this in a timely matter?

    ReplyDelete
  2. I think a bigger problem is the "Tyranny of the Targeted Advertisers" - prioritizing whatever inane, untrue, or inflammatory content is most likely to get advertising clicks.

    But collaborative filtering has many challenges. Spammers, trolls, phishers and others who are very good at figuring out how to become "meaningful subgroups" in order to gain an advantage.

    The easier you make it for small minority groups to get their content up-voted - the easier it is for some shady spammer or SEO consultant to get their content to the front-page.

    Special interest groups could also have hundreds of members working together voting in a coordinated manner. This happened on digg a while back. Conservative political groups would tell their members which articles to down-vote and up-vote. As a result articles with their point of view would get preferential placement on the front-page.

    Also never underestimate the power of trolls to undermine the system for fun.

    ReplyDelete
  3. Many ways of optimizing against "tyranny of the majority" come to mind. (I've spent my life dealing with this problem in meatspace. I could probably list a few hundred ways of optimizing for intelligent emergence. I also really like the jury system, with its [(% support level for a law)^12] method of setting free those who are innocent. That's a threshold-based feedback system, actually.) Here are a few additional ideas:

    (1) "Pricing" (the willigness to voluntarily trade cheap 'benevolence' karma points). In between posts, all posters are recharged with a certain small number of "Karma points" that they can deposit on a post (say 5 to 30 points). You can spend them, or you can keep them. You post again, and the number of Karma points you are assigned is based on the uprankings you receive from your latest post. This way, you're incentivized to post as much as you like, as long as you try hard to write meaningful posts. Display this ranking as a separate ranking above people's posts.

    (2) When someone is being snarky, or not putting much thought into a post or reply, it often shows, and they often know it themselves. Not all posts are intended to be "optimal." Allow posters themselves to post how serious they were about a post of their own. If they say Just showing I read it, here's my kneejerk .02. Perhaps allow them to choose between "This post isn't my best, allow me to keep my prior karma level, I know this will be judged harshly, and I know my karma will not increase or decrease from this post." Only let other people see that this was the case, after they've already ranked such a personally-labeled comment, so there's no bias.

    (3)Also, anyone from anywhere in the internet can +1 or -1 any post on the site, but that is kept separate from the other rankings on the site, to take advantage of the wisdom of crowds.

    (4) Try to set _meaningful thresholds_ for everything. Don't reorder or uprank immediately when rankings change, because then that skews the ongoing process by letting the morons know which comments they should uprank (like the posts under music videos on youtube). Allow people to reorder the rankings when they so desire.

    (5) Also, normalize the rankings for time, after a post has been up for 1 hour. Maybe even keep track of people's curves, and normalize based on "overall site traffic." (the person who posted at 1pm will get just as much 3-6pm 'after work' traffic as the person who posted at 2pm). Or, normalize based on Jeff Hawkins' "GROK" type neocortical product, or "Vitamin D" analysis of the curves. For instance, you could notice that when a comment is getting "thumbs up" after having been on the site for months and months, it usually has an initial "spike" of "upranking" right after it was posted (ex: maybe because Eliezer Yudkowsky is just sitting there, waiting for interesting new posts at lesswrong, and he always gives it an immediate thumbs up after reading it. LOL).

    ReplyDelete
  4. (6) You know what kind of site you want to create. People only take ranking seriously when there's something in it for them. Think about that. This means several things that beg to be separated:
    ---6a) Someone at lesswrong loves Eliezer's mathematics, but is a tyrant, and a determinist, not a libertarian (he believes the singularity will allow central planning to finally work. LOL). He's not being paid to rank things rationally, but sometimes his rankings are rational, since this person is technically skilled. Now, this person is perversely motivated by a perverse desire to defeat some of Eliezer's values. So he downgrades all comments, politically perversely, that suggest Eliezer's libertarianism. OK, fine. ...But this contradicts Eliezer's stated purposes. So, Eliezer talks to the downgrader, and he realizes: this guy makes a lot of money, and comes here for fun, to try to defeat libertarian politics. His karma is high, and he's smart (at least narrowly-smart) but his purposes are against mine. If he was being paid he might take the political comments more seriously and not snarkily downgrade/browbeat them with his karma, or less-optimal, he might refuse to be paid, and leave the site. In short, when the primate thrill of "vanquishing the enemy" is lost, and a sense of duty is added, that might incentivize better posts from SOME people. Others, it might drive away, or at least help identify.
    6b) Have two or more carefully-chosen thresholds' ratios define who has major karma. For instance, allow people to visit a person's page, and create a "admiration-ranking" definition of that person that is totally invisible to anyone except the viewer. For instance, have several buttons, labeled as the following. "I believe this person to be an expert in their discipline" "I really admired the following phrase, sentence, or paragraph the person wrote (can be cut and pasted, since all the person's posts appear automatically on the page, in a long scroll, below)" "I've read a lot of posts from this person, and I generally agree with them __% of the time" and "Words that you believe describe this person" --From the prior, when someone clicks on the "I believe this person to be an expert in their discipline" a closeable popup appears, and asks you to type in titles of books (selected "book" dropdown) and articles this person has written. For J. Storrs Hall, I might type in "What I want to be when I grow up is a cloud" and a few other of his works. This would indicate that I knew who he was, independent from anyone else, or that I took time to google search him. In any case, I'd also probably put in works that I thought were interesting, neat, novel, or otherwise described accurately MY brain's relationship to him and his ideas. When two people came very close to defining Hall the same way, he might be asked whether he agrees with some replies to his comments from the two similar describers. Those who wrote in "commie" or "science fictiony" might be downgraded in karma a lot (these things don't describe JoSH).

    Also, what if there were several PAID and highly-intelligent people who were hired and told "Please take your job as an upranker or downranker seriously. Rank as many comments as you can. Just sit and rank for 20 hours per week. Do not rank when you're tired. If you don't understand a post, flag it with a "?", or call admin and get it explained."

    ReplyDelete
  5. When all 10 or 20 uprankers (who do not know each other) upranked say 9/10 of a person's comments, it might pass a threshold where the page assigns "expert" status to that user. (What are the statistical odds of, say, three grad-student-level uprankers all giving a person a 9/10 on a technical subject, and being wrong? Add a 4th and 5th, and one downgrades, then it's time to look at their profile for inconsistencies. Is it with the prior experts, or not? The American Founders put the jury threshold at 1/12 being 28%likely to hang a law that 10% of the public disagrees with. Diminish the "downgrade rankers'" karma if the geniuses you've hand-selected upgrade what they've downgraded.)

    Another very interesting idea is to GIS a person's rough geographical location, and allow them to be judged by 'a jury of their peers.' or "Show the rankings of my geographically-nearest peers." (Where religious subgroups get mad at the admin, LOL. ...Unless the threshold is set low enough to minimize them all, unless they set the "order list by my peers only.")

    Do you let stupid subgroups flourish on your community? Sure. They get overwhelmed on some intelligent restructuring of the list. Let the stupid subgroups be vocal and stupid, ...so long as you find the few pearls of wisdom they have to offer.

    Also very good would be the ability of a user to display word clusters that describe their political beliefs and their religious beliefs. This way, those could be upranked as both an identified sub-community, and within the whole community. It might be interesting to try out giving extra karma to people who displayed those "two things morons don't talk about in bars, for fear of the loss of their ability to control their fists" descriptors or "self-identifiers" (religion and politics). It might also be interesting to allow people to uprank or downrank those descriptors. (A person who had a lot of upranks, and a few horrible downranks on their comments might notice that all the comments where they allowed their religious beliefs known matched the downrank on their "Catholic, Authoritarian" descriptor. Of course, such people would probably rapidly hide or leave, if they were fairly intelligent. Or leave it blank, knowing the likely pant-hooting and piling-on of disdain that was likely to follow. Tom Woods, interestingly, has very intelligent views on political decentralization of power, but claims to be a catholic. He is often roundly criticized by people who otherwise completely or mostly agree with him, just for this. ...Some people compartmentalize well.)

    ReplyDelete

  6. Also, it might be good to have a "hiring form" that identifies people in far greater detail than has ever been done by the unphilosophical morons who usually build those type of things. For instance, why not offer an immense range of qualifiers and descriptors when it comes to philosophy, proceeding from very general to very specific? Why not let someone call themselves a "Voluntaryist+Libertarian+Anarchist" who in the "views on war" section is "Open System Objectivist _as pertains to_ war against radical Islam" (Similar: Kelley, Branden, Suprynowicz) "TOC Objectivist as pertains to WWII" (Similar: Kelley, Branden, Rand, Peikoff, Suprynowicz) "Rothbardian _as pertains to_ war against militias" (Similar: Branden, Kelley, Suprynowicz Dissimilar: Peikoff). This could go on, in into extreme specifics in the generalities, and extreme minutia to clarify unity and disunity with other belief systems. Then, when the experts in your system disagree, there is either a clear reason why, or there is the ability to further discuss things to "get on the same page."

    Such fora might well produce expert systems on several subject areas. What about molecular manufacturing? Is there even a MM forum open to the public? Perhaps one might write "Read and understood" next to relevant works on such a forum.

    I think the rules one establishes for communication, if they were unbelievably superior, might actually create a distributed "Manhattan Project" somewhat unintentionally. Also: Might not such a community "open source" access to an electron microscope and laboratory? Might not this allow interested users to ramp up in a way never before imagined? Who needs a university structure? Are all old electron microscopes unbelievably expensive? What about all parts necessary to construct one? And what about expert systems necessary to teach the construction of such complex equipment? Given modern shipping, 3-D printing, manufacturing, etc., might it not be possible to have actual results emerge from a forum like that? (There may already be such a forum, but not with such self-filtering rules that allow for high-level emergence. Keep in mind that, with people, you need the educated 1% of the educated 1% to become HEAVILY involved. Also important is getting their nearest competition working with them toward the goal of finishing a project.)

    Hopefully, you didn't consider this to be blather. I didn't edit it, and just wrote what I thought at the moment. I give these thoughts a 7/10 in "honest self-ranking." :)

    PS: I imagine the " self-description "feature " at less-wrong to include "student of objectivism" (for someone who's drunk the Rand Kool-aid) or "objectivist" for someone who's a Kelley or Branden type of thinker. And I'd expect people to be able to uprank, or downrank every statement (every sentence?) made by anyone on the site. The web allows for this level of feedback intense environment.

    So someone writes
    philosophy: objectivism +1,300 (comments) -900 (comments)
    politics: libertarianism +1,700 (comments) -55 (comments)
    projected future format of your politics: minarchy, "night watchman state" +1,200 (comments) -3,300 (comments)
    religion: anti-theist, atheist, adeist, n/a, none +1,300 (comments) -5,000 (comments)

    Well, OK, the anarchists would downgrade "minarchy" and claim inconsistency. So switch to "only view karma rankings over 30 points," or "only view karma rankings made by registered users" (religious downgrade drops off) or "only view karma rankings from universally-recognized innovators" (reveals that a few "objectivist" innovators are on the site, and several agorist innovators are on the site).

    ReplyDelete


  7. These are just examples. The devil is in not getting rid of any information that is part of a groundswell, and not shaping the groundswell, unless you're recognizing that you're doing so willfully as a variable that can be interacted with, by allowing your shaping influence to be minimized or selectively hidden or unhidden. That way, you get the emergence of as many brains as possible, and you can set thresholds that truly do isolate the best brains.

    And maybe this is a part of what you wrote about in your article.

    ReplyDelete
  8. I really enjoy simply reading all of your web. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post agen bola

    ReplyDelete
  9. This is a very good post. Just wonderful. Truly, I am amazed at what informative things you've told us today. Thanks a million for that. sbobet

    ReplyDelete
  10. https://dewangebet77.com/ sebagai Agen Sbobet Terpercaya, Agen Judi Bola Online dan Sbobet Live Casino di Indonesia. Ngebet77 Merupakan Situs Judi Slot Online Terpercaya Yang Menyediakan Link alternatif Sbobet Dan Juga Mix Parlay Terbaik Dan Juga Sabung Ayam Terpopuler Di Dalam permainan Live Casino 338A SBOBET sendiri yang berbasis 338A Live Casino Dadu, mempunyai ragam permainan seperti Blackjack, Roulette, Sic bo Live. Selain permainan Casino, Kami juga mempunyai permainan Taruhan Bola Online seperti SBOBET, Maxbet, 368Bet.

    ReplyDelete