Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Linear Distance Metric Learning with Noisy Labels

Authors: Meysam Alishahi, Anna Little, Jeff M. Phillips

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Several experimental observations on synthetic and real data sets support and inform our theoretical results. ... Finally, in Section 5 we verify our theory on a variety of synthetic data experiments and demonstrate the utility of this linear DML framework on two real data problems that beneﬁt from a learned Mahalanobis distance.
Researcher Affiliation	Academia	Meysam Alishahi EMAIL Kahlert School of Computing University of Utah Salt Lake City, UT 84112, USA Anna Little EMAIL Department of Mathematics, Utah Center For Data Science University of Utah Salt Lake City, UT 84112, USA Jeff M. Phillips EMAIL Kahlert School of Computing, Utah Center for Data Science University of Utah Salt Lake City, UT 84112, USA and visiting Sca DS.AI, University of Leipzig, and MPI for Math in the Sciences
Pseudocode	No	The paper describes algorithms and methods in prose and through mathematical formulations but does not contain a distinct, structured pseudocode or algorithm block.
Open Source Code	Yes	All experimental results are reproducible; see the Git Hub repository by Alishahi et al. (2023) containing data and source codes.
Open Datasets	Yes	We consider a data set containing a training set of around 100,000 points and a test set of around 26,000 points from the Airline Passenger Satisfaction (Air, 2020) data set. ... We here use the Breast Cancer Wisconsin Diagnostic Data Set which is publicly available through the University of California Irvine (UCI) Machine Learning Repository (Wolberg et al., 1995).
Dataset Splits	Yes	We split the data into 15000 training and 5000 test points. ... Now, we have 15,000 pairs of original points, and we divide them into a train set of size 10,000 and a test set of size 5,000. ... The train and test sets contain about 45% of satisfied passengers... We here use the Breast Cancer Wisconsin Diagnostic Data Set ... We did this experiment 20 times and recorded the (weighted) average accuracies as in Table 7.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or memory specifications used for the experiments. It mentions training times but not the underlying hardware.
Software Dependencies	No	The paper mentions various methods like 'gradient descent' and refers to existing models like 'DML-eig' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	We solve Optimization Problem (10) using gradient descent and setting learning reate = 0.5, d = k = 10, number of iterations = 30000, and learning decay = .95. ... Using each of the Logistic, Laplace, and HS models (learning rate = 0.045 and number iterations = 20,000), we can recover the satisfaction labeling with 93% accuracy on the training and test data.