Gradient Estimation with Discrete Stein Operators
Authors: Jiaxin Shi, Yuhao Zhou, Jessica Hwang, Michalis Titsias, Lester Mackey
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate RODEO on 15 benchmark tasks, including training binary variational autoencoders (VAEs) with one or more stochastic layers. In most cases and with the same number of function evaluations, RODEO delivers lower variance and better training objectives than the state-of-the-art gradient estimators Dis ARM [14, 69], ARMS [13], Double CV [60], and RELAX [20]." and "Table 2: Training binary latent VAEs with K = 2, 3 (except for RELAX which uses 3 evaluations) on MNIST, Fashion-MNIST, and Omniglot. We report the average ELBO ( 1 standard error) on the training set after 1M steps over 5 independent runs. |
| Researcher Affiliation | Collaboration | Jiaxin Shi Stanford University jiaxins@stanford.edu Yuhao Zhou Tsinghua University yuhaoz.cs@gmail.com Jessica Hwang Stanford University jjhwang@stanford.edu Michalis K. Titsias Deep Mind mtitsias@google.com Lester Mackey Microsoft Research New England lmackey@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Optimizing Eq [f (x)] with RODEO gradients |
| Open Source Code | Yes | Python code replicating all experiments can be found at https://github.com/thjashin/rodeo. |
| Open Datasets | Yes | We consider the MNIST [33], Fashion-MNIST [66] and Omniglot [32] datasets using their standard train, validation, and test splits. |
| Dataset Splits | Yes | We consider the MNIST [33], Fashion-MNIST [66] and Omniglot [32] datasets using their standard train, validation, and test splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Python code' but does not provide specific version numbers for Python or any key software libraries and dependencies used in the experiments. |
| Experiment Setup | Yes | The VAE architecture and training experimental setup follows Titsias and Shi [60], and details are given in Appendix D." and "The functions H (13) and H (14) share a neural network architecture with two output units and a single hidden layer with 100 units." and "We report the average ELBO ( 1 standard error) on the training set after 1M steps over 5 independent runs. |