7 outstanding papers at ICLR 2022

The International Conference on Learning Representations (ICLR) has announced the ICLR 2022 Outstanding Paper Awards. The selection committee consisted of Andreas Krause (ETH-Zurich), Atlas Wang (UT Austin), Been Kim (Google Brain), Bo Li (University of Illinois Urbana-Champaign), Bohyung Han (Seoul National University), He He (New York University), and Zaid Harchaoui (University of Washington).

The outstanding papers are listed below:

Analytic-DPM

Authors

Bo Zhang-Department of Computer Science & Technology, Institute for AI, Tsinghua-Huawei Joint Center for AI

Fan Bao-Department of Computer Science & Technology, Institute for AI, Tsinghua-Huawei Joint Center for AI

Chongxuan Li-Gaoling School of Artificial Intelligence, Renmin University of China, Beijing

Jun Zhu-Department of Computer Science & Technology, Institute for AI, Tsinghua-Huawei Joint Center for AI

Diffusion probabilistic models (DPMs)– first proposed by Sohl-Dickstein et al., 2015–fall under the class of generative models. The problem comes in with the inference of DPMs as it is too expensive since it requires iteration over thousands of timesteps. It needs to estimate the variance in each timestep of the reverse process. Till now, most of the work done on this uses a handcrafted value for all timesteps (Nichol & Dhariwal, 2021).

Here, the researchers of the paper titled, Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models, have proposed Analytic-DPM, a “training-free inference framework that estimates the analytic forms of the variance and Kullback–Leibler divergence (KL divergence) using the Monte Carlo method and a pretrained score-based model.” The researchers also said Analytic-DPM applies to various DPMs like Ho et al., 2020; Song et al., 2020a; Nichol & Dhariwal, 2021) in a plug-and-play manner. Analytic-DPM improves the log-likelihood of various DPMs, produces high-quality samples, and also produces 20× to 80× speed.

To read the paper, click here.

Hyperparameter Tuning with Renyi Differential Privacy

Authors

Nicolas Papernot– Google Research, Brain Team

Thomas Steinke-Google Research, Brain Team

Differential privacy is a system for publicly sharing information about a dataset by disclosing patterns in the groups but not revealing information about individual entities in the dataset.

Noisy (stochastic) gradient descent is a popular method for ensuring differential privacy ( Song et al., 2013; Bassily et al., 2014; Abadi et al., 2016). In the paper titled, Hyperparameter Tuning with Renyi Differential Privacy, the researchers pointed out that DP-SGD differs from the standard gradient.

The gradients are computed on a per example basis
Individual gradients are clipped so that its 2-norm is bounded.
Gaussian noise is added to the gradients.

Due to these differences, it bounds the sensitivity of each update so that the added noise ensures differential privacy. The researchers showed how setting hyperparameters based on non-private training runs could leak private information. The team also provided privacy guarantees for hyperparameter search procedures within the framework of Renyi Differential Privacy. The results improve on and extend the work of Liu and Talwar (STOC 2019).

To read the paper, click here.

Learning Strides in Convolutional Neural Networks

Rachid Riad

Olivier Teboul-Google Research

David Grangier-Google Research

Neil Zeghidour-Google Research

CNNs have wide applications in image and text classification, speech recognition, translation etc. The paper, Learning Strides in Convolutional Neural Networks, addresses a critical issue while using CNNs – setting the strides in a principled way instead of trials and errors.

What do you mean by Stride?

DeepAI defines “stride” as a neural network’s filter parameter that modifies the amount of movement over the image or video.

Inspired by the work titled Spectral Representations for Convolutional Neural Networks, the researchers proposed “DiffStride, the first downsampling layer with learnable strides”. Instead of cropping with a fixed bounding box controlled by a striding hyperparameter, DiffStride learns the size of its cropping box through backpropagation.

Expressiveness and Approximation Properties of Graph Neural Networks

Authors

Floris Geerts -Department of Computer Science, University of Antwerp, Belgium

Juan L. Reutter, School of Engineering, Pontificia Universidad Catolica de Chile, Chile & IMFD

GNN architectures are characterised by the separation power of graph algorithms such as color refinement (CR) and k-dimensional Weisfeiler-Leman tests (k-WL). Understanding the separation power of a given GNN architecture requires complex proofs focused on the specifics of the architecture.

In the paper titled, Expressiveness and Approximation Properties of Graph Neural Networks, the researchers proposed a tensor language-based technique to analyse the separation power of general GNNs. The approach also provides a toolbox with which GNN architecture designers can analyse the separation power of their GNNs without the need to figure out the intricacies of the WL-tests.

Comparing Distributions by Measuring Differences that Affect Decision Making

Authors

Shengjia Zhao-Department of Computer Science Stanford University

Abhishek Sinha-Department of Computer Science Stanford University

Yutong He-Department of Computer Science Stanford University

Aidan Perreault-Department of Computer Science Stanford University

Jiaming Song-Department of Computer Science Stanford University

Stefano Ermon– Department of Computer Science Stanford University

Quantifying the discrepancy between two probability distributions is a huge challenge in machine learning. The paper, Comparing Distributions by Measuring Differences that Affect Decision Making, introduces a new class of discrepancies based on the optimal loss for a decision task. By suitably choosing the decision task, it generalises the Jensen Shannon divergence and the maximum mean discrepancy family.

By applying this approach to two-sample tests and various benchmarks, the team has achieved superior test power compared to competing methods.

Neural Collapse Under MSE Loss

Authors

David L. Donoho-Stanford University

X.Y. Han,- Cornell University

Vardan Papyan– University of Toronto

The paper titled Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path says during the neural collapse, last-layer features collapse to their class-means while the class-means collapse to the same Simplex Equiangular Tight Frame. The classifier behaviour collapses to the nearest-class-mean decision rule.

The paper proposes a new theoretical construct of “central path”, where the linear classifier stays MSE-optimal for feature activations throughout the dynamics.

Bootstrapped Meta-Learning

Authors

Sebastian Flennerhag, Research Scientist at DeepMind

Yannick Schroecker, Research Scientist at DeepMind

Tom Zahavy-Senior Research Scientist at Deepmind

Hado van Hasselt, Senior Staff Research Scientist at DeepMind

Satinder Singh Baveja, Research Scientist at DeepMind

David Silver, Principal Research Scientist, DeepMind

Meta-learning essentially means ‘learning to learn‘. The paper titled, Bootstrapped Meta-Learning, outlines a few challenges that crop up in meta-learning. Meta-learning is challenging because it must first be applied to evaluate an update rule. And it comes with high computational costs. Several challenges in meta-optimisation degrade the performance. The researchers have proposed an algorithm that lets the meta-learner teach itself.

The algorithm first bootstraps a target from the meta-learner and then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo) metric. The bootstrapping mechanism can extend the effective meta-learning horizon without requiring backpropagation through all updates. The researchers achieved new state-of-the-art for model-free agents on the Atari ALE benchmark and showed that it yields both performance and efficiency gains in multi-task meta-learning.

7 outstanding papers at ICLR 2022

Analytic-DPM

Hyperparameter Tuning with Renyi Differential Privacy

Learning Strides in Convolutional Neural Networks

What do you mean by Stride?

Expressiveness and Approximation Properties of Graph Neural Networks

Comparing Distributions by Measuring Differences that Affect Decision Making

Neural Collapse Under MSE Loss

Bootstrapped Meta-Learning

Related Posts

Google Returns to Federated Learning Over Privacy Concerns

Hugging Face Makes OpenAI’s Worst Nightmare Come True

Say Hello to the New Game-Changing Humanoid Robot of the Future