Variational Inference for Infinitely Deep Neural Networks
Authors: Achille Nazaret, David Blei
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study the UDN on real and synthetic data. We find that (i) on synthetic data, the UDN achieves higher accuracy than finite neural networks of similar architecture (ii) on real data, the UDN outperforms finite neural networks and other models of infinite neural networks (iii) for both types of data, the inference adapts the UDN posterior to the data complexity, by exploring distinct sets of truncations. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Columbia University, New York, USA 2Department of Statistics, Columbia University, New York, USA. |
| Pseudocode | Yes | Algorithm 1 Dynamic variational inference for the UDN |
| Open Source Code | Yes | The code is available on Git Hub2. |
| Open Datasets | Yes | We study the performance of the UDN on the CIFAR-10 dataset (Krizhevsky et al., 2009). We run additional experiments on tabular datasets. We perform regression with the UDN for nine regression datasets from the UCI repository (Dua & Graff, 2017): Boston Housing (boston) Concrete Strength (concrete), Energy Efficiency (energy), Kin8nm (kin8nm), Naval Propulsion (naval), Power Plant (power), Protein Structure (protein), Wine Quality (wine) and Yacht Hydrodynamics (yacht). |
| Dataset Splits | Yes | For each ω, we independently sample a train, a validation and a test dataset of each 1024 samples. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud computing instance types used for running the experiments. |
| Software Dependencies | Yes | In libraries like Tensorflow 1.0 (Abadi et al., 2015), the computational graph is defined and compiled in advance. In contrast, a library like Py Torch (Paszke et al., 2019) uses a dynamic graph. |
| Experiment Setup | Yes | For each ω, we generate a dataset D(ω) on which we train the models for 4000 epochs. Prior on the neural network weights: θ N(0, 1) Prior on the truncation ℓ: ℓ 1 Poisson(0.5). Optimizer: Adam (Kingma & Ba, 2015) Learning rate: 0.005. Learning rate for λ: we use a learning rate that is 1/10th of the general learning rate of the neural network weights. Initialization of the variational truncated Poisson family: λ = 1.0 Number of epochs: 4000. Optimizer: SGD with momentum = 0.9, weight decay=1e-4 Number of epochs: 500 Learning rate schedule: [0.01]*5 + [0.1]*195 + [0.01]*100 + [0.001]*100 Learning rate for λ: we used the same learning rate for λ and the weights Initialization of the variational truncated Poisson family: λ0 = 5.0. |