The keynotes were consistently excellent. Some standouts for me were:
- Pedros Domingos presented his latest take on sum-product networks as a class of nonconvex functions for which finding a global maximum is tractable. Machine learning was (is?) obsessed with convex functions because it is a large class for which finding the global maximum is tractable. Lately the deep learning community has convincingly argued that convexity is too limiting, and as a result we are all getting more comfortable with more ``finicky'' optimization procedures. Perhaps what we need is a different function class?
- Hendrik Blockeel talked about declarative machine learning. I work in a combination systems-ML group and I can tell you systems people love this idea. All of them learned about how relational algebra ushered in a declarative revolution in databases via SQL, and see the current state of affairs in machine learning as a pre-SQL mess.
- Jure Leskovec did an unannounced change of topic, and delivered a fabulous keynote which can paraphrased as: ``hey you machine learning people could have a lot of impact on public policy, but first you need to understand the principles and pitfalls of counterfactual estimation.'' I couldn't agree more, c.f., Gelman. (Jure also gave the test-of-time paper talk about Kronecker graphs.)
- Natasa Milic-Frayling detailed (with some disdain) the miriad of techniques that digital web and mobile advertising firms use to track and profile users. It was all very familiar because I worked in computational advertising for years, but the juxtaposition of the gung-ho attitude of ad networks with the European elevated respect for privacy was intriguing from a sociological perspective.
- Half-Space Mass: A Maximally Robust and Efficient Data Depth Method. Statisticians have been ruminating for years on how to extend the concept of median to multidimensional data sets. Half-Space Mass is a slight tweak of Tukey's half-space depth which has many of the desirable properties and yet is easy to estimate via sampling. This is potentially relevant to multiple unsupervised scenarios, e.g., anomaly detection.
- Superset Learning Based on Generalized Loss Minimization. This is a different way of thinking about label uncertainty which yields several familiar techniques, but also some new ones.