Variational Inference with Tail-adaptive f-Divergence

Authors: Dilin Wang, Hao Liu, Qiang Liu

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate our adaptive f-divergence with different models. We use reparameterization gradients as default since they have smaller variances (Kingma & Welling, 2014) and normally yield better performance than score function gradients. Our code is available at https://github.com/dilinwang820/adaptive-f-divergence. 6.1 Gaussian Mixture Toy Example We first illustrate the approximation quality of our proposed adaptive f-divergence on Gaussian mixture models. ... 6.2 Bayesian Neural Network We evaluate our approach on Bayesian neural network regression tasks. The datasets are collected from the UCI dataset repository3. ... 6.3 Application in Reinforcement Learning We now demonstrate an application of our method in reinforcement learning...
Researcher Affiliation Academia Dilin Wang UT Austin dilin@cs.utexas.edu Hao Liu UESTC uestcliuhao@gmail.com Qiang Liu UT Austin lqiang@cs.utexas.edu
Pseudocode Yes Algorithm 1 Variational Inference with Tail-adaptive f-Divergence (with Reparameterization Gradient)
Open Source Code Yes Our code is available at https://github.com/dilinwang820/adaptive-f-divergence.
Open Datasets Yes The datasets are collected from the UCI dataset repository3. ... We use a minibatch of size 256 to approximate the gradient in each iteration. We train the model for 10, 000 iterations. ... All datasets are randomly partitioned into 90% for training and 10% for testing.
Dataset Splits No The paper mentions "90% for training and 10% for testing" but does not specify a separate validation split or explicit use of a validation set.
Hardware Specification No The paper mentions support from "Google Cloud" but does not provide specific details on the hardware (e.g., GPU/CPU models, memory) used for the experiments.
Software Dependencies No The paper mentions the use of specific optimizers like "Adagrad" and "Adam optimizer (Kingma & Ba, 2015)", but does not provide version numbers for these or other software libraries/environments (e.g., Python, PyTorch, TensorFlow versions) used in the implementation.
Experiment Setup Yes The model is optimized using Adagrad with a constant learning rate 0.05. We use a minibatch of size 256 to approximate the gradient in each iteration. We train the model for 10, 000 iterations. To learn the component weights, we apply the Gumble-Softmax trick (Jang et al., 2017; Maddison et al., 2017) with a temperature of 0.1. ... We use Adam optimizer (Kingma & Ba, 2015) with a constant learning rate of 0.001. The gradient is approximated by n = 100 draws of xi qθ and a minibatch of size 32 from the training data points. ... The policy π, the value function V (s), and the Q-function Q(s, a) are neural networks with two fully-connected layers of 128 hidden units each. We use Adam (Kingma & Ba, 2015) with a constant learning rate of 0.0003 for optimization. The size of the replay buffer for Half Cheetah is 107, and we fix the size to 106 on other environments...