Implicit Reparameterization Gradients
Authors: Mikhail Figurnov, Shakir Mohamed, Andriy Mnih
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that the proposed approach is faster and more accurate than the existing gradient estimators for these distributions. |
| Researcher Affiliation | Industry | Michael Figurnov Shakir Mohamed Andriy Mnih Deep Mind, London, UK {mfigurnov,shakir,amnih}@google.com |
| Pseudocode | Yes | Table 1: Comparison of the two reparameterization types. While they provide the same result, the implicit version is easier to implement for distributions such as Gamma because it does not require inverting the standardization function Sφ(z). Forward pass Sample " ~ q(") Sample z ~ qφ(z) Set z = S−1 φ (") Backward pass Set rφz = rφS−1 φ (") Set rφf(z) = rzf(z)rφz |
| Open Source Code | Yes | Implicit reparameterization for Gamma, Student s t, Beta, Dirichlet and von Mises distributions is available in Tensor Flow Probability [11]. |
| Open Datasets | Yes | We use the 20 Newsgroups (11,200 documents, 2,000-word vocabulary) and RCV1 [29] (800,000 documents, 10,000-word vocabulary) datasets with the same preprocessing as in [47]. |
| Dataset Splits | No | The paper mentions using 20 Newsgroups, RCV1, and MNIST datasets but does not explicitly provide training, validation, or test dataset splits (e.g., percentages or counts) within its text. |
| Hardware Specification | No | No specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments are provided. The paper only mentions using 'Tensor Flow [1] for our experiments'. |
| Software Dependencies | No | The paper mentions software like TensorFlow, TensorFlow Probability, C++, and PyTorch but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | For Gamma, we use a sparse Gamma(0.3, 0.3) prior and a bell-shaped prior Gamma(10, 10). For Beta and von Mises, instead of a sparse prior we choose a uniform prior over the corresponding domain. |