So the classic example is when you have a search engine result, and you get just one click at a particular position $p$, and your attentional model assumes that the user considered every result up to that position plus one more. Therefore the partial preferences $\forall x \in [1, p + 1], x \neq p: r_p > r_x$ are revealed and added to the (ranking) training set.
Later in my career I began to appreciate stochastic contextual bandits, specifically the importance of debiasing the historical state-action density in order to get consistent estimates. That left me with an inconsistency: on the one hand, optimizing a search engine with click feedback is definitely Learning Through Exploration, since you only get information about the relative preferences of (a subset of) the items presented. On the other hand I'm not attempting to debias the historical state-action density when I'm doing straight Joachims.
I was hoping this paper would resolve this difficulty for me. It did, but not in the way I expected; the contextual bandit literature is only referred to in the introduction for comparative purposes. Instead the authors make the following assumptions:
- User loss is convex in (linear) utility differences.
- Users only suggest improvements (i.e., user feedback always points ``downhill'').
- Users only suggest significant improvements (i.e., feedback states have a utility increment at least proportional to the increment to the optimum).
I suspect the analysis does not look how I anticipated because, subject to the conditions of the previous paragraph, the user feedback can be chosen adversarially. Nonetheless it could be interesting to consider a ``contextual bandit style'' formulation, e.g., instead of learning the reward associated with the chosen arm, one learns the difference between the reward of the chosen arm and another arm. A good place to start would be the literature on contextual bandits with controlled side information, but a key difference here is that the user feedback is not under control of the algorithm.
No comments:
Post a Comment