reproducibilityindex.ai

Estimating Gradients for Discrete Random Variables by Sampling without Replacement

Authors: Wouter Kool, Herke van Hoof, Max Welling

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments with a toy problem, a categorical Variational Auto-Encoder and a structured prediction problem show that our estimator is the only estimator that is consistently among the best estimators in both high and low entropy settings.
Researcher Affiliation	Collaboration	Wouter Kool University of Amsterdam ORTEC w.w.m.kool@uva.nl Herke van Hoof University of Amsterdam h.c.vanhoof@uva.nl Max Welling University of Amsterdam CIFAR m.welling@uva.nl
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	1Code available at https://github.com/wouterkool/estimating-gradients-without-replacement.
Open Datasets	Yes	The dataset is MNIST, statically binarized by thresholding at 0.5 (although we include results using the standard binarized dataset by Salakhutdinov & Murray (2008); Larochelle & Murray (2011) in Section G.2).
Dataset Splits	Yes	Figure 5 shows the -ELBO evaluated during training on the validation set.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models used for running its experiments.
Software Dependencies	No	The paper mentions 'PyTorch' and 'Adam' but does not specify version numbers for these or other key software components.
Experiment Setup	Yes	We optimize the ELBO using the analytic KL for 1000 epochs using the Adam (Kingma & Ba, 2015) optimizer. We use a learning rate of 10−3 for all estimators except Gumbel-Softmax and RELAX, which use a learning rate of 10−4 as we found they diverged with a higher learning rate. and We did not do any hyperparameter optimization and used the exact same training details, using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 10−4 (no decay) for 100 epochs for all estimators. For the baselines, we used the same batch size of 512, but for estimators that use k = 4 samples, we used a batch size of 512 / 4 = 128 to compensate for the additional samples.