Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Assessing Generative Models via Precision and Recall

Authors: Mehdi S. M. Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, Sylvain Gelly

NeurIPS 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the practical utility of the proposed approach we perform an empirical study on several variants of Generative Adversarial Networks and Variational Autoencoders. In an extensive set of experiments we show that the proposed metric is able to disentangle the quality of generated samples from the coverage of the target distribution.
Researcher Affiliation	Collaboration	Mehdi S. M. Sajjadi MPI for Intelligent Systems, Max Planck ETH Center for Learning Systems Olivier Bachem Google Brain Mario Lucic Google Brain Olivier Bousquet Google Brain Sylvain Gelly Google Brain. This work was done during an internship at Google Brain.
Pseudocode	No	The paper describes the algorithm mathematically and verbally, but does not provide a formal pseudocode block or algorithm listing.
Open Source Code	Yes	An implementation of the algorithm is available at https://github.com/msmsajjadi/precision-recall-distributions.
Open Datasets	Yes	We consider three data sets commonly used in the GAN literature: MNIST [15], Fashion-MNIST [25], and CIFAR-10 [13]... we use the Multi NLI corpus... Following [6], we embed these sentences using a Bi LSTM with 2048 cells in each direction and max pooling, leading to a 4096-dimensional embedding [7].
Dataset Splits	Yes	Then, for a ﬁxed i = 1, . . . , 10, we generate a set ˆQi, which consists of samples from the ﬁrst i classes from the training set.
Hardware Specification	No	The paper does not specify the exact hardware (e.g., CPU, GPU models, memory) used for running the experiments. It only implies computations were performed on standard machines.
Software Dependencies	No	The paper does not specify version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	We then cluster the union of ˆP and ˆQ in this feature space using mini-batch k-means with k = 20 [21]. As the clustering algorithm is randomized, we run the procedure several times and average over the PRD curves... Following [6], we embed these sentences using a Bi LSTM with 2048 cells in each direction and max pooling, leading to a 4096-dimensional embedding [7].