Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Experimental Design under the Bradley-Terry Model
Authors: Yuan Guo, Peng Tian, Jayashree Kalpathy-Cramer, Susan Ostmo, J.Peter Campbell, Michael F.Chiang, Deniz Erdogmus, Jennifer Dy, Stratis Ioannidis
IJCAI 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally evaluate the performance of these methods over synthetic and real-life datasets. |
| Researcher Affiliation | Academia | 1 ECE Department, Northeastern University, Boston, MA, USA. 2 Department of Radiology, Massachusetts General Hospital, Charlestown, MA, USA. 3 Dept of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, OR, USA. |
| Pseudocode | Yes | Algorithm 1 Greedy Algorithm |
| Open Source Code | Yes | We make our code publicly available.2 2https://github.com/neu-spiral/Experimental_Design |
| Open Datasets | Yes | ROP Dataset. Our first dataset consists of 100 images of retinas, labeled by experts w.r.t. the presence of a disease called Retinopathy of Prematurity (ROP) [Kalpathy-Cramer et al., 2016]. SUSHI Dataset. The SUSHI Preference dataset [Kamishima et al., 2009] consists of rankings of N = 100 sushi food items by 5000 customers. |
| Dataset Splits | Yes | In each experiment, we partition the dataset N into three datasets: a training set Ntrn, a test set Ntst, and a validation set Nval. ... For each dataset, we perform 3-fold cross validation, repeating the partition to training and test datasets keeping the validation set fixed. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Each of the four algorithms listed above have a hyperparameter that needs to be tuned: σ0 for MI, c for Cov, λe for Ent, and λf for Fisher. We tune these parameters on a validation set, as described in Section 5.3. We run all algorithm with K ranging from 0 to 100, with the exception of MI, that is the most computation intensive: we execute this for K = 0 to 15. |