Machined Learnings: Interactive PAC

Sunday, April 15, 2012

Interactive PAC

I recently was inspired to summarize progress in the application of PAC-style algorithm design for interactive settings, and ended up with this small table.

	Probably	Approximately Correct
Passive Batch Learning	wrt draws of the data set	IID Concentration
Passive Online Learning	wrt draws of the data set	Martingale (``Online to Batch'' conversion)
Active Online Learning	wrt draws of the data set and sequence of label inspection decisions	Martingale (custom?)
Contextual Bandits	wrt draws of the data set and sequence of actions chosen	Martingale (Freedman-style)

Two things jump out. The first is the versatility of martingales for taming interactive learning. How much further can this style of reasoning go? It'd be foolish to prognosticate with any certainty, but it feels like there is still some room to run. Ultimately there are the limits of the stochastic setting, of course: not all environments are well approximated as oblivious. So eventually we'll all be doing game theory, unless something like this has killed everybody. Before then, I worry about practically insurmountable sample complexity issues trying to tackle credit assignment problems in reinforcement learning. One can already see hints of this, e.g., worst-case instantaneous regret for agnostic stochastic contextual bandits indicates data requirements scale linearly with the number of possible actions. On the other hand, data volumes are scaling even faster than computation. If the relative growth continues indefinitely than computational complexity will trump sample complexity as an algorithmic design issue.

The second is the fundamental nature of online learning. In the 1990s you could be forgiven if (like me!) you thought of online learning as an implementation detail, a type of optimization strategy which was particularly good for the types of objective functions that arise in machine learning (I was totally unaware of prediction of individual sequences). With the benefit of hindsight, it is clear that online learning lends itself better to thinking about how information is revealed to (or collected by!) an algorithm over time, and that passive batch learning is more like a special case with particular implementation possibilities.

4 comments:

Lev ReyzinMay 9, 2012 at 7:09 PM
Ran into your blog. You might be interested in another application of a Freedman-style inequality (http://arxiv.org/abs/1002.4058). Here, the data can be adversarial (e.g. not necessarily iid), but the algorithm can act randomly and still do well. This doesn't get into game theory, however, because of how regret is defined -- this may be a limitation of the model.
ReplyDelete
Replies

Add comment