Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Advances in Black-Box VI: Normalizing Flows, Importance Weighting, and Optimization
Authors: Abhinav Agrawal, Daniel R. Sheldon, Justin Domke
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate components relating to optimization, flows, and Monte-Carlo methods on a benchmark of 30 models from the Stan model library. |
| Researcher Affiliation | Academia | 1College of Information and Computer Sciences, University of Massachusetts Amherst 2Department of Computer Science, Mount Holyoke College EMAIL |
| Pseudocode | No | The paper describes algorithms and procedures in text but does not include any formally labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using existing tools like Autograd and Stan but does not provide a link or explicit statement about releasing the source code for their own methodology or implementation. |
| Open Datasets | Yes | We evaluate each method using a benchmark of 30 models from the Stan Model library [35, 36]. |
| Dataset Splits | No | The paper discusses evaluation using 10,000 fresh samples and mentions ADVI's step-size selection based on ELBO after 200 iterations, but it does not specify a distinct validation set with percentages or counts for hyperparameter tuning or model selection in a general sense separate from the final evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions software like "Autograd, a Python automatic differentiation library [22]" and "Stan, a state-of-the-art probabilistic programming framework [4]" but does not specify their version numbers or other required software dependencies with versions. |
| Experiment Setup | Yes | During optimization, all methods have the same computational budget, measured as 100 "oracle evaluations" of the log p per iteration, and are optimized for 30,000 iterations. |