Variational Inference for Infinitely Deep Neural Networks

Authors: Achille Nazaret, David Blei

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study the UDN on real and synthetic data. We find that (i) on synthetic data, the UDN achieves higher accuracy than finite neural networks of similar architecture (ii) on real data, the UDN outperforms finite neural networks and other models of infinite neural networks (iii) for both types of data, the inference adapts the UDN posterior to the data complexity, by exploring distinct sets of truncations.
Researcher Affiliation Academia 1Department of Computer Science, Columbia University, New York, USA 2Department of Statistics, Columbia University, New York, USA.
Pseudocode Yes Algorithm 1 Dynamic variational inference for the UDN
Open Source Code Yes The code is available on Git Hub2.
Open Datasets Yes We study the performance of the UDN on the CIFAR-10 dataset (Krizhevsky et al., 2009). We run additional experiments on tabular datasets. We perform regression with the UDN for nine regression datasets from the UCI repository (Dua & Graff, 2017): Boston Housing (boston) Concrete Strength (concrete), Energy Efficiency (energy), Kin8nm (kin8nm), Naval Propulsion (naval), Power Plant (power), Protein Structure (protein), Wine Quality (wine) and Yacht Hydrodynamics (yacht).
Dataset Splits Yes For each ω, we independently sample a train, a validation and a test dataset of each 1024 samples.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud computing instance types used for running the experiments.
Software Dependencies Yes In libraries like Tensorflow 1.0 (Abadi et al., 2015), the computational graph is defined and compiled in advance. In contrast, a library like Py Torch (Paszke et al., 2019) uses a dynamic graph.
Experiment Setup Yes For each ω, we generate a dataset D(ω) on which we train the models for 4000 epochs. Prior on the neural network weights: θ N(0, 1) Prior on the truncation ℓ: ℓ 1 Poisson(0.5). Optimizer: Adam (Kingma & Ba, 2015) Learning rate: 0.005. Learning rate for λ: we use a learning rate that is 1/10th of the general learning rate of the neural network weights. Initialization of the variational truncated Poisson family: λ = 1.0 Number of epochs: 4000. Optimizer: SGD with momentum = 0.9, weight decay=1e-4 Number of epochs: 500 Learning rate schedule: [0.01]*5 + [0.1]*195 + [0.01]*100 + [0.001]*100 Learning rate for λ: we used the same learning rate for λ and the weights Initialization of the variational truncated Poisson family: λ0 = 5.0.