reproducibilityindex.ai

ARMS: Antithetic-REINFORCE-Multi-Sample Gradient for Binary Variables

Authors: Aleksandar Dimitriev, Mingyuan Zhou

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate ARMS on several datasets for training generative models, and our experimental results show that it outperforms competing methods. Our experimental setup follows the one in Yin & Zhou (2019) and Dong et al. (2020), and all VAE experiments are built on top of the available Dis ARM code.
Researcher Affiliation	Academia	1Mc Combs School of Business, The University of Texas at Austin, Austin, Texas 78712, USA. Correspondence to: Alek Dimitriev <alekdimi@utexas.edu>, Mingyuan Zhou <mingyuan.zhou@mccombs.utexas.edu>.
Pseudocode	Yes	Algorithm 1 Antithetic Dirichlet copula sampling. Algorithm 2 Antithetic Gaussian copula sampling.
Open Source Code	Yes	The code is publicly available1. 1https://github.com/alekdimi/arms
Open Datasets	Yes	The comparison is done on three different benchmark datasets: dynamically binarized MNIST, Fashion MNIST, and Omniglot, with each dataset split into the training, validation, and test sets.
Dataset Splits	No	The paper states that each dataset is split into training, validation, and test sets, but it does not provide specific percentages, sample counts, or citations to predefined splits for these datasets. It only confirms that such splits are used.
Hardware Specification	Yes	All the models were trained on a K40 Nvidia GPU and Intel Xeon E5-2680 processor.
Software Dependencies	No	The paper mentions 'Adam' as an optimizer and 'Leaky Re LU' activations but does not specify version numbers for any software dependencies or libraries (e.g., PyTorch, TensorFlow, etc.).
Experiment Setup	Yes	The nonlinear network has two hidden layers of 200 units each, using Leaky Re LU (Maas et al., 2013) activations with a coefﬁcient of 0.3. Adam (Kingma & Ba, 2015) with a learning rate of 1e 4 is used to optimize the network parameters, and SGD with learning rate 1e 2 for the prior distribution logits. The optimization is run for 106 steps with mini batches of size 50. For RELAX, the scaling factor is initialized to 1, the temperature to 0.1, and the control variate is a neural network with one hidden layer of 137 units using Leaky Re LU activations. The only data preprocessing involves subtracting the global mean of the dataset from each image before it is input to the encoder.