I've been focused on generative models for processing crowdsourced data lately. These models take a item and an associated set of suggested labels from a set of workers and synthesize a posterior distribution over the true label. A worker can be considered an expert and the algorithms provide a procedure to synthesize the expert opinions into a final decision.
In the supervised setting there are ways to achieve this synthesis that come with much better theoretical guarantees. Therefore even though the generative models can incorporate revealed ground truth, if ground truth were abundant other techniques would be preferred. For instance, one could imagine a bizarro world where one has a large pile of labeled data but one is trying to assemble a system that will leverage crowdsource workers to generalize to novel data. In this case the crowdsource workers would first examine the labeled set and provide their answers, then a supervised machine learning formulation would be used to synthesize a decision system from crowdsource worker output. Subsequently, novel instances would be first analyzed by crowdsource workers and then the final decision would be taken automatically based upon the workers' output.
Alas ground truth is usually not revealed for the crowdsourced data; in machine learning, acquiring labeled data is often the very reason for engaging a crowdsourcing service. The generative models are able to proceed without labeled training data because of the assumption that the typical worker is usually correct. The end result of this assumption is that workers that tend to be in the majority are considered more accurate and contribute more strongly to the posterior than those that tend to be in the minority. If this underlying assumption is not true, the generative models can make arbitrarily bad decisions, which is why other techniques would be preferred if applicable.
What I've described is a potentially incorrect statistical assumption, forced upon a system by a deficit of information, that leads to a preference for consensus. In other words, a formal model for the herd mentality! I wonder if this has any implications, e.g., for behavioural finance. After all, when I think about my day to day experience, I certainly feel there is a plethora of opinions and a scarcity of fact.
No comments:
Post a Comment