This reduction is for average constrained CSMC to importance weighted binary classification. It is nearly identical to the filter tree reduction of unconstrained CSMC, with the addition that infeasible choices automatically lose any tournament they enter (if both entrants are infeasible, one is chosen arbitrarily). Because of this rule the underlying binary classifier is only invoked for distinctions between feasible choices and the regret analysis is essentially the same.
Algorithm:Forfeit Filter Tree Train
Data: Constrained CSMC training data set $S$.
Input: Importance-weighted binary classification routine $\mbox{Learn}$.
Input: A binary tree $T$ over the labels with internal nodes $\Lambda (T)$.
Result: Trained classifiers $\{\Psi_n | n \in \Lambda (T) \}$.
Input: Importance-weighted binary classification routine $\mbox{Learn}$.
Input: A binary tree $T$ over the labels with internal nodes $\Lambda (T)$.
Result: Trained classifiers $\{\Psi_n | n \in \Lambda (T) \}$.
- For each $n \in \Lambda (T)$ from leaves to roots:
- $S_n = \emptyset$.
- For each example $(x, \omega, c) \in S$:
- Let $a$ and $b$ be the two classes input to $n$ (the predictions of the left and right subtrees on input $(x, \omega)$ respectively).
- If $a \in \omega$, predict $b$ for the purposes of constructing training input for parent node (``$a$ forfeits'');
- else if $b \in \omega$, predict $a$ for the purposes of constructing training input for parent node (``$b$ forfeits'');
- else (when $a \not \in \omega$ and $b \not \in \omega$), $S_n \leftarrow S_n \cup \{ (x, 1_{c_a < c_b}, | c_a - c_b | ) \}$;
- Let $\Psi_n = \mbox{Learn} (S_n)$.
- Return $\{\Psi_n | n \in \Lambda (T) \}$.
Algorithm:Forfeit Filter Tree Test
Input: A binary tree $T$ over the labels with internal nodes $\Lambda (T)$.
Input: Trained classifiers $\{\Psi_n | n \in \Lambda (T) \}$.
Input: Instance realization $(x, \omega)$.
Result: Predicted label $k$.
Input: Trained classifiers $\{\Psi_n | n \in \Lambda (T) \}$.
Input: Instance realization $(x, \omega)$.
Result: Predicted label $k$.
- Let $n$ be the root node.
- Repeat until $n$ is a leaf node:
- If all the labels of the leaves in the left-subtree of $n$ are in $\omega$, traverse to the right child;
- else if all the labels of the leaves in the right-subtree of $n$ are in $\omega$, traverse to the left child;
- else if $\Psi_n (x) = 1$, traverse to the left child;
- else (when $\Psi_n (x) = 0$ and at least one label in each subtree is not in $\omega$), traverse to the right child.
- Return leaf label $k$.
Regret Analysis
The regret analysis for the forfeit filter tree is very similar to the regret analysis for the filter tree, with additional arguments for forfeiture cases.The average constrained CSMC problem is characterized by a distribution $D = D_x \times D_{\omega|x} \times D_{c|\omega,x}$, where $c: K \to \mathbf{R}$ takes values in the extended reals $\mathbf{R} = \mathbb{R} \cup \{ \infty \}$, and the components of $c$ which are $\infty$-valued for a particular instance are revealed as part of the problem instance via $\omega \in \mathcal{P} (K)$ (i.e., $\omega$ is a subset of $K$). The regret of a particular classifier $h: X \times \mathcal{P} (K) \to K$ is given by \[ r_{av} (h) = E_{(x, \omega) \sim D_x \times D_{\omega|x}} \left[ E_{c \sim D_{c|\omega,x}} \left[ c (h (x, \omega)) - \min_{k \in K}\, E_{c \sim D_{c|\omega,x}} \left[ c (k) \right] \right] \right]. \] Let $\Psi = (T, \{\Psi_n | n \in \Lambda (T) \})$ denote a particular forfeit filter tree (i.e., a choice of a binary tree and a particular set of node classifiers), and let $h^\Psi$ denote the classifier that results from the forfeit filter tree.
The regret analysis leverages an induced importance-weighted binary distribution $D^\prime (\Psi)$ over triples $(x^\prime, y, w)$ defined as follows:
- Draw $(x, \omega, c)$ from $D$.
- Draw $n$ uniform over the internal nodes of the binary tree.
- Let $x^\prime = (x, n)$.
- Let $a$ and $b$ be the two classes input to $n$ (the predictions of the left and right subtrees on input $x$ respectively).
- If $a \in \omega$, create importance-weighted binary example $(x^\prime, 0, 0)$;
- else if $b \in \omega$, create importance-weighted binary example $(x^\prime, 1, 0)$;
- else (when $a \not \in \omega$ and $b \not \in \omega$), create importance-weighted binary example $(x^\prime, 1_{c_a < c_b}, | c_a - c_b |)$.
In other words, there is no importance-weighted regret at node $n$ if either the left or the right subtree at $n$ entirely consists of labels that are infeasible for this instance, or if the classifier makes the correct decision.
Theorem:Regret Bound
For all average constrained CSMC distributions $D$ and all forfeit filter trees $\Psi$, \[ r_{av} (h^\Psi) \leq (|K| - 1) q (\Psi), \] where $q (\Psi)$ is the importance-weighted binary regret on the induced subproblem.
Proof: See Appendix.
Proof: See Appendix.
Minimax Counterexample
To show the forfeit trick won't work on minimax constrained CSMC, consider a null feature space problem with 3 classes and deterministic cost vector $c = (x, y, z)$ with $x < z < y$. Further suppose our binary tree first plays 1 vs. 2, then plays (winner of 1 vs. 2) vs. 3. Training with $\omega = \emptyset$ the forfeit filter tree will learn to take both left branches from the root and choose 1, leading to zero CSMC regret. If an adversary then comes along and sets $\omega = \{ 1 \}$ at test time, the forfeit filter tree will choose 2 creating regret since 3 is the best choice once 1 is infeasible.If this sounds totally unfair, keep in mind that a regression reduction with zero regret can handle arbitrarily imposed constraints at test time without incurring additional regret. This is because regression not only determines the best class, but actually the order of all the classes by cost. (This doesn't mean reduction to regression is better; actually the converse, it shows regression is solving a harder problem than is typically necessary).
This also suggests that minimax constrained CSMC, unlike average constrained CSMC, is harder than unconstrained CSMC.
No comments:
Post a Comment