Gradient Estimation with Discrete Stein Operators

Authors: Jiaxin Shi, Yuhao Zhou, Jessica Hwang, Michalis Titsias, Lester Mackey

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate RODEO on 15 benchmark tasks, including training binary variational autoencoders (VAEs) with one or more stochastic layers. In most cases and with the same number of function evaluations, RODEO delivers lower variance and better training objectives than the state-of-the-art gradient estimators Dis ARM [14, 69], ARMS [13], Double CV [60], and RELAX [20]." and "Table 2: Training binary latent VAEs with K = 2, 3 (except for RELAX which uses 3 evaluations) on MNIST, Fashion-MNIST, and Omniglot. We report the average ELBO ( 1 standard error) on the training set after 1M steps over 5 independent runs.
Researcher Affiliation Collaboration Jiaxin Shi Stanford University jiaxins@stanford.edu Yuhao Zhou Tsinghua University yuhaoz.cs@gmail.com Jessica Hwang Stanford University jjhwang@stanford.edu Michalis K. Titsias Deep Mind mtitsias@google.com Lester Mackey Microsoft Research New England lmackey@microsoft.com
Pseudocode Yes Algorithm 1 Optimizing Eq [f (x)] with RODEO gradients
Open Source Code Yes Python code replicating all experiments can be found at https://github.com/thjashin/rodeo.
Open Datasets Yes We consider the MNIST [33], Fashion-MNIST [66] and Omniglot [32] datasets using their standard train, validation, and test splits.
Dataset Splits Yes We consider the MNIST [33], Fashion-MNIST [66] and Omniglot [32] datasets using their standard train, validation, and test splits.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'Python code' but does not provide specific version numbers for Python or any key software libraries and dependencies used in the experiments.
Experiment Setup Yes The VAE architecture and training experimental setup follows Titsias and Shi [60], and details are given in Appendix D." and "The functions H (13) and H (14) share a neural network architecture with two output units and a single hidden layer with 100 units." and "We report the average ELBO ( 1 standard error) on the training set after 1M steps over 5 independent runs.