Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Fair and Diverse DPP-Based Data Summarization

Authors: Elisa Celis, Vijay Keswani, Damian Straszak, Amit Deshpande, Tarun Kathuria, Nisheeth Vishnoi

ICML 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results on both real-world and synthetic datasets show that the diversity of the samples produced by adding fairness constraints is not too far from the unconstrained case.
Researcher Affiliation	Collaboration	1 Ecole Polytechnique F ed erale de Lausanne (EPFL), Switzerland 2Microsoft Research, India 3UC Berkeley.
Pseudocode	Yes	Algorithm 1 Sample-And-Project
Open Source Code	No	The paper does not provide a link or explicit statement about the availability of the source code for the described methodology.
Open Datasets	Yes	We gathered a collection of images curated using Google image search as follows: Four search terms were used: (a) Scientist Male , (b) Scientist Female , (c) Painter Male , and (d) Painter Female (Imagedataset). ... The Adult income dataset (Blake & Merz, 1998) consists of roughly 45000 records of subjects... Data downloaded from https://archive.ics.uci. edu/ml/datasets/adult.
Dataset Splits	No	The paper does not explicitly provide details about training, validation, and test splits for the datasets used in the experiments. It mentions sampling 'k samples' or 'sets of size 400' but not specific dataset splits.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	The paper mentions 'vlfeat toolbox' and 'k-means algorithm' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	We sample 40 images from each biased dataset... We conduct 200 repetitions. ... In preprocessing the data we ﬁlter out incomplete entries, and from the remaining ones we pick a random subset of 5000 records for our simulations. ... Sets of size 400 were selected, and 100 samples were taken for each.