Here's an example application to a data set where I asked Mechanical Turkers to estimate the age of the owner of a Twitter profile and select the best answer from a fixed set of age ranges.
pmineiro@ubuntu-67% ~/src/nincompoop/ordinalonlineextract/src/ordinalonlineextract --initial_t 10000 --n_worker_bits 16 --n_items 4203 --n_labels 6 --priorz 555,3846,7786,5424,1242,280 --model flass --data <(./multicat 80 =(sort -R agehit.ooe.in)) --eta 1 --rho 0.9 initial_t = 10000 eta = 1.000000 rho = 0.900000 n_items = 4203 n_labels = 6 n_workers = 65536 test_only = false prediction file = (no output) priorz = 0.029004,0.201002,0.406910,0.283449,0.064908,0.014633 cumul since example current current current avg q last counter label predict ratings -1.092649 -1.092649 2 -1 2 4 -1.045608 -1.017383 5 -1 2 5 -1.141637 -1.233824 10 -1 2 5 -1.230889 -1.330283 19 -1 2 5 -1.199410 -1.159306 36 -1 3 3 -1.177825 -1.155147 69 -1 2 4 -1.151384 -1.122146 134 -1 2 5 -1.153009 -1.154689 263 -1 1 5 -1.151538 -1.149990 520 -1 3 4 -1.146140 -1.140607 1033 -1 2 5 -1.124684 -1.103209 2058 -1 1 5 -1.107670 -1.090658 4107 -1 0 4 -1.080002 -1.052260 8204 -1 2 4 -1.051428 -1.022821 16397 -1 5 5 -1.023710 -0.995977 32782 -1 4 2 -0.998028 -0.972324 65551 -1 2 3 -0.976151 -0.954265 131088 -1 2 3 -0.958616 -0.941080 262161 -1 2 5 -0.953415 -0.935008 336240 -1 5 -1 applying deferred prior updates ... finished kappa = 0.0423323 rho_lambda = 0.00791047 gamma = 0.4971 1.4993 2.5006 3.5035 4.5022This is slower than I'd like: the above output takes 9 minutes to produce on my laptop. Hopefully I'll discover some additional optimizations in the near future (update: it now takes slightly under 4 minutes; another update: it now takes about 30 seconds).
The model produces a posterior distribution over the labels which can be used directly to make a decision or to construct a cost vector for training a cost-sensitive classifier. To show the nontrivial nature of the posterior, here's a neat example of two records that got the same number of each type of rating, but for which the model chooses a very different posterior distribution over the ground truth. First, the input:
KevinWihardjo|A1U4W67HW5V0FO:2 A1J8TVICSRC70W:1 A27UXXW0OEBA0:2 A2V3P1XE33NYC3:2 A1MX4AZU19PR92:1 taniazahrina|A3EO2GJAMSBATI:2 A2P0F978S0K4LF:2 AUI8BVP9IRQQJ:2 A2L54KVSIY1GOM:1 A1XXDKKNVQD4XE:1Each profile has three Turkers saying ``2'' (20-24) and two Turkers saying ``1'' (15-19). Now the posterior distributions,
KevinWihardjo -0.142590 0.000440 0.408528 0.590129 0.000903 0.000000 0.000000 taniazahrina 0.954630 0.000003 0.999001 0.000996 0.000000 0.000000 0.000000The second column is the item difficulty ($\log \alpha$) and the remaining columns are the posterior distribution over the labels. For the first profile the posterior is distributed between labels 1 and 2 with a mode at 2 whereas for the second profile the posterior is concentrated on label 1. There are many potential reasons for the model to do this, e.g., the raters who said ``2'' for taniazahrina might have a bias towards higher age responses across the entire data set. Honestly, with these profiles I don't have a good idea what their true ages are, so I don't know which posterior is ``better''. I do have data indicating that the ordinal label model is more accurate than the Olympic Judge heuristic (which is discarding the highest and lowest score and averaging the remaining).
ordinalonlineextract is available from nincompoop repository at Google Code.
No comments:
Post a Comment