Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GMValuator: Similarity-based Data Valuation for Generative Models

Authors: Jiaxi Yang, Wenlong Deng, Benlin Liu, Yangsibo Huang, James Y Zou, Xiaoxiao Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	GMVALUATOR is extensively evaluated on benchmark and high-resolution datasets and various mainstream generative architectures to demonstrate its effectiveness.
Researcher Affiliation	Academia	1University of British Columbia 2University of Washington 3Princeton University 4Stanford University 5Vector Institute
Pseudocode	Yes	A concise summary of key notations and Algorithm 1, detailing the pipeline of GMValuator in Sec. A and Sec. B.
Open Source Code	Yes	Our code is available at: https://github.com/ubc-tea/GMValuator.
Open Datasets	Yes	The generation tasks are conducted on benchmark datasets (i.e., MNIST Le Cun et al. (1998) and CIFAR Krizhevsky et al. (2009)), face recognition dataset (i.e., Celeb A Liu et al. (2018)), high-resolution image dataset with size 512 512, and 1024 1024 (i.e., AFHQ Choi et al. (2020), FFHQ Karras et al. (2019)), the large-scale dataset with 1,000 classes and 14,197,122 images (i.e., Image Net Deng et al. (2009)), and text-to-image dataset (i.e., Naruto Cervenka (2022)).
Dataset Splits	Yes	We support this by partitioning a class of CIFAR-10 (the class is plane here) into two non-overlapped subsets, denoted as Xv1 and Xv2.3 Next, we keep Xv1 as non-training data and use Xv2 as training data to train a Big GAN Brock et al. (2018) and generate dataset ˆX. If our assumption holds, the generated data will be more similar to the training data Xv2.
Hardware Specification	Yes	GPU One RTX 3080 (10GB) CPU 12 v CPU Intel(R) Xeon(R), Platinum 8255C CPU @ 2.50GHz
Software Dependencies	No	The paper mentions using specific tools/libraries like CLIP, MANIQA, LPIPS, Dream Sim, and Product Quantization, but does not provide specific version numbers for these or for the underlying programming languages/frameworks.
Experiment Setup	Yes	We report the averaged ρ over the generated datasets (the data size m=100) on different choices of k in Table 2.