There is, of course, no shortage of research directions that would make plausible competition subjects. The challenge, so to speak, is how to organize the challenge. With supervised learning, the game is clear: here's a labeled training set, here's an unlabeled test set, submit your answers. There's some sophistication possible in running the leaderboard, but mostly the supervised learning competition is a straightforward setup. Additional complexity, however, would require some innovation. Here are some examples.
- Nonstationary environments. In real life the environment is changing either obliviously or adversarially. A competition could explore this, but presumably can't release the test set in order to simulate the “fog of war”. So this means submissions need to be executable, a protocol for scoring answers has to defined, etc. Somebody would have to do some infrastructure work to make all that happen.
- Automated training In this case, the competition wouldn't even release a training set! Instead submissions would be algorithms which were capable of taking a training set and producing a model which could be evaluated on a test set. Clearly infrastructure work is required to facilitate this.
- Computational constraints Unholy ensembles don't win in real life because nobody would deploy such a model. Real models are subject to both space and time constraints. Soeren Sonnenburg organized a large scale learning challenge several years ago which tried to assess performance under computational and sample complexity constraints. It was an admirable first effort, but there are some problems. A big one: it's difficult to come up with a ranking function (also in real life: you can usually negotiate for a bit more server memory and/or latency if you can demonstrate a lift in performance, but the tradeoffs aren't clear). There were other minor infelicities, e.g., participants had to time their own algorithms. Furthermore the competition didn't address space complexity of the resulting model, which in my experience is very important: models that are too large don't fit on production machines (given everything else going on) and/or take too long to update. So in this area there's definitely room to innovate in competition design.
- Partial feedback Call it contextual bandits, reinforcement learning, … heck, call it banana. With almost every problem I've worked on, there was a closed loop where the actions of the algorithm determined the data was collected. A competition could release partially observed history to initialize a policy, but a real test should involve online operation where actions generate feedback which updates the model, etc.