reproducibilityindex.ai

Experimental Design under the Bradley-Terry Model

Authors: Yuan Guo, Peng Tian, Jayashree Kalpathy-Cramer, Susan Ostmo, J.Peter Campbell, Michael F.Chiang, Deniz Erdogmus, Jennifer Dy, Stratis Ioannidis

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally evaluate the performance of these methods over synthetic and real-life datasets.
Researcher Affiliation	Academia	1 ECE Department, Northeastern University, Boston, MA, USA. 2 Department of Radiology, Massachusetts General Hospital, Charlestown, MA, USA. 3 Dept of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, OR, USA.
Pseudocode	Yes	Algorithm 1 Greedy Algorithm
Open Source Code	Yes	We make our code publicly available.2 2https://github.com/neu-spiral/Experimental_Design
Open Datasets	Yes	ROP Dataset. Our ﬁrst dataset consists of 100 images of retinas, labeled by experts w.r.t. the presence of a disease called Retinopathy of Prematurity (ROP) [Kalpathy-Cramer et al., 2016]. SUSHI Dataset. The SUSHI Preference dataset [Kamishima et al., 2009] consists of rankings of N = 100 sushi food items by 5000 customers.
Dataset Splits	Yes	In each experiment, we partition the dataset N into three datasets: a training set Ntrn, a test set Ntst, and a validation set Nval. ... For each dataset, we perform 3-fold cross validation, repeating the partition to training and test datasets keeping the validation set ﬁxed.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	Each of the four algorithms listed above have a hyperparameter that needs to be tuned: σ0 for MI, c for Cov, λe for Ent, and λf for Fisher. We tune these parameters on a validation set, as described in Section 5.3. We run all algorithm with K ranging from 0 to 100, with the exception of MI, that is the most computation intensive: we execute this for K = 0 to 15.