Deep learning was by far the most popular conference track, to the extent that the conference room for this track was overwhelmed and beyond standing room only. I missed several talks I wanted to attend because there was no physical possibility of entrance. This is despite the fact that many deep learning luminaries and their grad students were at CVPR. Fortunately Yoshua Bengio chose ICML and via several talks provided enough insight into deep learning to merit another blog post. Overall the theme is: having conquered computer vision, deep learning researchers are now turning their attention to natural language text, with some notable early successes, e.g., paragraph vector. And of course the brand is riding high, which explains some of the paper title choices, e.g., “deep boosting”. There was also a conference track titled “Neural Theory and Spectral Methods” ... interesting bedfellows!
ADMM suddenly became popular (about 18 months ago given the latency between idea, conference submission, and presentation). By this I don't mean using ADMM for distributed optimization, although there was a bit of that. Rather there were several papers using ADMM to solve constrained optimization problems that would otherwise be vexing. The take-home lesson is: before coming up with a customized solver for whatever constrained optimization problem which confronts you, try ADMM.
Now for the laundry list of papers (also note the papers described above):
- Input Warping for Bayesian Optimization of Non-stationary Functions. If you want to get the community's attention, you have to hit the numbers, so don't bring a knife to a gunfight.
- Nuclear Norm Minimization via Active Subspace Selection. The inimitable Cho-Jui Hsieh has done it again, this time applying ideas from active variable methods to nuclear norm regularization.
- Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. A significant improvement in the computational complexity required for agnostic contextual bandits.
- Efficient programmable learning to search. Additional improvements in imperative programming since NIPS. If you are doing structured prediction, especially in industrial settings where you need to put things into production, you'll want to investigate this methodology. First, it eases the burden of specifying a complicated structured prediction task. Second, it reduces the difference between training and evaluation, which not only means faster deployment, but also less defects introduced between experiments and the production system.
- Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels. It is good to confirm quasi-random numbers can work better for randomized feature maps.
- A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data. I'll need to spend some quality time with this paper.
- Multiresolution Matrix Factorization. Nikos and I have had good luck learning discriminative representations using classical matrix decompositions. I'm hoping this new decomposition technique can be analogously adapted.
- Sample-based Approximate Regularization. I find data-dependent regularization promising (e.g., dropout on least-squares is equivalent to a scale-free L2 regularizer), so this paper caught my attention.
- Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm. No experiments in the paper, so maybe this is a ``pure theory win'', but it looks interesting.