So I've been trying to come up with something along those lines. I don't have anything I'm super happy with, but here are some thoughts. First, here's a theorem that says that the conditional SSP regret is bounded by the conditional cost-sensitive regret summed across subproblems encountered on the SSP regret minimizing path.
Theorem:Optimal Path Regret Bound
For fixed $x$, let $p^* (x)$ denote the set of $(k, v)$ pairs encountered on the SSP regret minimizing path of length $n$ from source vertex $s$ to target vertex $t$. Then for all SSP distributions and all cost-sensitive classifiers $\Psi$, \[ r_{spath} (\Psi) = E_{x \sim D_x} \left[ \sum_{(k, v) \in p^* (x)} q_{kv} (\Psi | x) \right], \] with the definition $\forall v, q_{n-1,v} (\Psi | x) = q_{nv} (\Psi | x) = 0$.
Proof: Let $\Upsilon^*_{kv} (x)$ be the set of $(k, v)$ pairs encountered on the SSP regret minimizing path of length $n - k$ from source vertex $v$ to target vertex $t$. Proof is inductive on the property $r_{kv} (\Psi | x) \leq \sum_{(k^\prime,v^\prime) \in \Upsilon^*_{kv} (x)} q_{kv} (\Psi | x)$.
When $k = n - 2$, it is easily verified directly that $r_{n-2,v} (\Psi | x) = q_{n-2,v} (\Psi | x)$.
For $k \in [1, n - 2)$, \[ \begin{aligned} r_{kv} (\Psi | x) &\leq q_{kv} (\Psi | x) + r_{k+1,w^*} (\Psi | x) \\ &\leq q_{kv} (\Psi | x) + \sum_{(k^\prime, v^\prime) \in \Upsilon^*_{k+1,w^*} (x)} q_{k^\prime,v^\prime} (\Psi | x) \\ &= \sum_{(k^\prime, v^\prime) \in \Upsilon^*_{kv} (x)} q_{k^\prime, v^\prime} (\Psi | x), \end{aligned} \] where the first step is from the next step bound lemma, the second step leverages the induction hypothesis, and the third step is the from definition of $\Upsilon^*_{kv} (x)$. Noting that $\Upsilon^*_{1s} (x) = p^* (x)$ yields
\[ r_{spath} (\Psi | x) = r_{1s} (\Psi | x) \leq \sum_{(k, v) \in p^* (x)} q_{kv} (\Psi | x). \]
Taking expectation with respect to $D_x$ completes the proof.
Proof: Let $\Upsilon^*_{kv} (x)$ be the set of $(k, v)$ pairs encountered on the SSP regret minimizing path of length $n - k$ from source vertex $v$ to target vertex $t$. Proof is inductive on the property $r_{kv} (\Psi | x) \leq \sum_{(k^\prime,v^\prime) \in \Upsilon^*_{kv} (x)} q_{kv} (\Psi | x)$.
When $k = n - 2$, it is easily verified directly that $r_{n-2,v} (\Psi | x) = q_{n-2,v} (\Psi | x)$.
For $k \in [1, n - 2)$, \[ \begin{aligned} r_{kv} (\Psi | x) &\leq q_{kv} (\Psi | x) + r_{k+1,w^*} (\Psi | x) \\ &\leq q_{kv} (\Psi | x) + \sum_{(k^\prime, v^\prime) \in \Upsilon^*_{k+1,w^*} (x)} q_{k^\prime,v^\prime} (\Psi | x) \\ &= \sum_{(k^\prime, v^\prime) \in \Upsilon^*_{kv} (x)} q_{k^\prime, v^\prime} (\Psi | x), \end{aligned} \] where the first step is from the next step bound lemma, the second step leverages the induction hypothesis, and the third step is the from definition of $\Upsilon^*_{kv} (x)$. Noting that $\Upsilon^*_{1s} (x) = p^* (x)$ yields
\[ r_{spath} (\Psi | x) = r_{1s} (\Psi | x) \leq \sum_{(k, v) \in p^* (x)} q_{kv} (\Psi | x). \]
Taking expectation with respect to $D_x$ completes the proof.
Returning to the question of a better bound, it's a short hop from the above theorem to the statement \[ r_{spath} (\Psi) \leq (n - 2) E_{x \sim D_x} \left[ \max_{kv}\; q_{kv} (\Psi | x) \right], \] which has the right flavor. For instance, if I reduce the individual CSMC subproblems to regression, I'd pick up an extra $\sqrt{2n}$ leading factor and a $\sqrt{\epsilon_{kv}}$ dependence on the regression regret for the subproblem. That would give \[ \begin{aligned} r_{spath} (\Psi) &\leq (n - 2) E_{x \sim D_x} \left[ \max_{kv}\; q_{kv} (\Psi | x) \right] \\ &\leq (n - 2) \sqrt{2 n} E_{x \sim D_x} \left[ \max_{kv}\; \sqrt{\epsilon_{kv} (\Psi | x)} \right] \\ &\leq (n - 2) \sqrt{2 n} \sqrt{ E_{x \sim D_x} \left[ \max_{kv}\; \epsilon_{kv} \right]} \end{aligned} \] where $\max_a\; \sqrt{f_a} = \sqrt{\max_a\; f_a}$ and Jensen's inequality combine in the last step. This somewhat looks like $O (n^{3/2}) \sqrt{\epsilon}$ that I get from a direct reduction to regression. However I'm not satisfied, because I don't have a good intuition about how the expectation of a single regression subproblem relates to an ``$L^{\infty}$ style'' expectation over the maximum of a set of regression subproblems.
No comments:
Post a Comment