Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Monotonic Risk Relationships under Distribution Shifts for Regularized Risk Minimization

Authors: Daniel LeJeune, Jiayu Liu, Reinhard Heckel

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	First, we show that for a real-world regression problem, inand out-of-distribution performances are linearly correlated. Speciﬁcally, we show that for object detection, the performance of models trained on the COCO 2017 training set and evaluated on the COCO 2017 validation set is linearly correlated with the performance on the VOC 2012 dataset. and Code for the experiments and ﬁgures in this paper can be found at https://github. com/MLI-lab/monotonic_risk_relationships.
Researcher Affiliation	Academia	Daniel Le Jeune EMAIL Department of Statistics Stanford University Stanford, CA 94305-4020, USA and Jiayu Liu EMAIL Reinhard Heckel EMAIL Department of Electrical and Computer Engineering Technical University of Munich 80333 Munich, DE
Pseudocode	No	The paper contains numerous theorems, lemmas, propositions, assumptions, and definitions, but it does not include any clearly labeled pseudocode or algorithm blocks. Procedures are described in natural language and mathematical notation.
Open Source Code	Yes	Code for the experiments and ﬁgures in this paper can be found at https://github. com/MLI-lab/monotonic_risk_relationships.
Open Datasets	Yes	We evaluate a collection of neural network models for object detection, which are trained on the COCO 2017 training set (Lin et al., 2014): Faster R-CNN (Ren et al., 2015)... and We consider a binary classiﬁcation task of classifying even versus odd digits on the MNIST (Le Cun et al., 2010) dataset and ARDIS (Kusetogullari et al., 2020) dataset IV.
Dataset Splits	Yes	We evaluate a collection of neural network models for object detection, which are trained on the COCO 2017 training set (Lin et al., 2014) (...) evaluated on the COCO 2017 validation set and the VOC 2012 training/validation set (Everingham et al., 2010). and We train the model listed above on MNIST training set using the Adam optimizer (...) evaluating test performance during training as validation accuracy milestones are reached.
Hardware Specification	Yes	All models are evaluated using an NVIDIA A40 GPU. and All models are trained on an NVIDIA A40 GPU.
Software Dependencies	No	The models we evaluate are from torchvision.models and public github repositories: Retina Net (Lin et al., 2017): Retina Net Res Net-50 FPN Mask R-CNN (He et al., 2017): Mask R-CNN Res Net-50 FPN SSD (Liu et al., 2016): SSD300 VGG16, SSDlite320 Mobile Net V3-Large Faster R-CNN (Ren et al., 2015): Faster R-CNN Res Net-50 FPN, Faster R-CNN Mobile Net V3-Large FPN, Faster R-CNN Mobile Net V3-Large 320 FPN Keypoint R-CNN (He et al., 2017): Keypoint R-CNN Res Net-50 FPN YOLOv5 (Redmon et al., 2016; Jocher et al., 2020): YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x - While specific model names are given, explicit version numbers for software dependencies like Python, PyTorch, or torchvision are missing.
Experiment Setup	Yes	We train the model listed above on MNIST training set using the Adam optimizer with an initial learning rate 10 4 and a batch size 10 and a learning rate scheduler with a step size 10 epochs and a learning rate decay factor 0.1. The models at the top right corner of Figure 3(right) are trained for 20 epochs. Intermediate models are obtained by early stopping when validation accuracy ﬁrst reaches 0.5, 0.6, 0.7, 0.8 and 0.9. Each model is trained eight times with random initialization and with random shuﬄing of the training data, using diﬀerent random seeds.