Experimental Design under the Bradley-Terry Model

Authors: Yuan Guo, Peng Tian, Jayashree Kalpathy-Cramer, Susan Ostmo, J.Peter Campbell, Michael F.Chiang, Deniz Erdogmus, Jennifer Dy, Stratis Ioannidis

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally evaluate the performance of these methods over synthetic and real-life datasets.
Researcher Affiliation Academia 1 ECE Department, Northeastern University, Boston, MA, USA. 2 Department of Radiology, Massachusetts General Hospital, Charlestown, MA, USA. 3 Dept of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, OR, USA.
Pseudocode Yes Algorithm 1 Greedy Algorithm
Open Source Code Yes We make our code publicly available.2 2https://github.com/neu-spiral/Experimental_Design
Open Datasets Yes ROP Dataset. Our first dataset consists of 100 images of retinas, labeled by experts w.r.t. the presence of a disease called Retinopathy of Prematurity (ROP) [Kalpathy-Cramer et al., 2016]. SUSHI Dataset. The SUSHI Preference dataset [Kamishima et al., 2009] consists of rankings of N = 100 sushi food items by 5000 customers.
Dataset Splits Yes In each experiment, we partition the dataset N into three datasets: a training set Ntrn, a test set Ntst, and a validation set Nval. ... For each dataset, we perform 3-fold cross validation, repeating the partition to training and test datasets keeping the validation set fixed.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Each of the four algorithms listed above have a hyperparameter that needs to be tuned: σ0 for MI, c for Cov, λe for Ent, and λf for Fisher. We tune these parameters on a validation set, as described in Section 5.3. We run all algorithm with K ranging from 0 to 100, with the exception of MI, that is the most computation intensive: we execute this for K = 0 to 15.