Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Monotonic Risk Relationships under Distribution Shifts for Regularized Risk Minimization

Authors: Daniel LeJeune, Jiayu Liu, Reinhard Heckel

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental First, we show that for a real-world regression problem, inand out-of-distribution performances are linearly correlated. Specifically, we show that for object detection, the performance of models trained on the COCO 2017 training set and evaluated on the COCO 2017 validation set is linearly correlated with the performance on the VOC 2012 dataset. and Code for the experiments and figures in this paper can be found at https://github. com/MLI-lab/monotonic_risk_relationships.
Researcher Affiliation Academia Daniel Le Jeune EMAIL Department of Statistics Stanford University Stanford, CA 94305-4020, USA and Jiayu Liu EMAIL Reinhard Heckel EMAIL Department of Electrical and Computer Engineering Technical University of Munich 80333 Munich, DE
Pseudocode No The paper contains numerous theorems, lemmas, propositions, assumptions, and definitions, but it does not include any clearly labeled pseudocode or algorithm blocks. Procedures are described in natural language and mathematical notation.
Open Source Code Yes Code for the experiments and figures in this paper can be found at https://github. com/MLI-lab/monotonic_risk_relationships.
Open Datasets Yes We evaluate a collection of neural network models for object detection, which are trained on the COCO 2017 training set (Lin et al., 2014): Faster R-CNN (Ren et al., 2015)... and We consider a binary classification task of classifying even versus odd digits on the MNIST (Le Cun et al., 2010) dataset and ARDIS (Kusetogullari et al., 2020) dataset IV.
Dataset Splits Yes We evaluate a collection of neural network models for object detection, which are trained on the COCO 2017 training set (Lin et al., 2014) (...) evaluated on the COCO 2017 validation set and the VOC 2012 training/validation set (Everingham et al., 2010). and We train the model listed above on MNIST training set using the Adam optimizer (...) evaluating test performance during training as validation accuracy milestones are reached.
Hardware Specification Yes All models are evaluated using an NVIDIA A40 GPU. and All models are trained on an NVIDIA A40 GPU.
Software Dependencies No The models we evaluate are from torchvision.models and public github repositories: Retina Net (Lin et al., 2017): Retina Net Res Net-50 FPN Mask R-CNN (He et al., 2017): Mask R-CNN Res Net-50 FPN SSD (Liu et al., 2016): SSD300 VGG16, SSDlite320 Mobile Net V3-Large Faster R-CNN (Ren et al., 2015): Faster R-CNN Res Net-50 FPN, Faster R-CNN Mobile Net V3-Large FPN, Faster R-CNN Mobile Net V3-Large 320 FPN Keypoint R-CNN (He et al., 2017): Keypoint R-CNN Res Net-50 FPN YOLOv5 (Redmon et al., 2016; Jocher et al., 2020): YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x - While specific model names are given, explicit version numbers for software dependencies like Python, PyTorch, or torchvision are missing.
Experiment Setup Yes We train the model listed above on MNIST training set using the Adam optimizer with an initial learning rate 10 4 and a batch size 10 and a learning rate scheduler with a step size 10 epochs and a learning rate decay factor 0.1. The models at the top right corner of Figure 3(right) are trained for 20 epochs. Intermediate models are obtained by early stopping when validation accuracy first reaches 0.5, 0.6, 0.7, 0.8 and 0.9. Each model is trained eight times with random initialization and with random shuffling of the training data, using different random seeds.