reproducibilityindex.ai

ModelDiff: A Framework for Comparing Learning Algorithms

Authors: Harshay Shah, Sung Min Park, Andrew Ilyas, Aleksander Madry

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate MODELDIFF through three case studies, comparing models trained with/without data augmentation, with/without pre-training, and with different SGD hyperparameters. Our code is available at https://github.com/Madry Lab/ modeldiff.
Researcher Affiliation	Academia	1Massachusetts Institute of Technology. Correspondence to: Harshay Shah <harshay@mit.edu>.
Pseudocode	No	The paper describes its framework visually in Figure 2 and textually in Section 3, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code	Yes	Our code is available at https://github.com/Madry Lab/ modeldiff.
Open Datasets	Yes	Living17. The Living17 dataset (Santurkar et al., 2021) is an Image Net-derived dataset... Waterbirds. The Waterbirds dataset (Sagawa et al., 2020) consists of bird images taken from the CUB dataset (Wah et al., 2011) and pasted on backgrounds from the Places dataset (Zhou et al., 2017). CIFAR-10. We consider the standard CIFAR-10 (Krizhevsky, 2009) image classification dataset...
Dataset Splits	Yes	Estimating these datamodels entail three design choices: ... Sample size for datamodel estimation: ... we make a 90 10% train-validation split. For model selection, we choose the model checkpoint that has the maximum average accuracy on the validation dataset.
Hardware Specification	Yes	We train our models on a cluster of machines, each with 9 NVIDIA A100 or V100 GPUs and 96 CPU cores.
Software Dependencies	No	The paper mentions software like FFCV and fast-l1, but does not provide specific version numbers for these or other key software components, which is necessary for reproducibility. For example: 'We use FFCV (Leclerc et al., 2022)', 'we use the fast-l1 package a SAGA-based GPU solver for ℓ1-regularized regression'.
Experiment Setup	Yes	We train models for 25 epochs using SGD with the following configuration: initial learning rate 0.6, batch size 1024, cyclic learning rate schedule (with peak at epoch 12), momentum 0.9, weight decay 0.0005, and label smoothing (with smoothing hyperparameter 0.1).