ModelDiff: A Framework for Comparing Learning Algorithms

Authors: Harshay Shah, Sung Min Park, Andrew Ilyas, Aleksander Madry

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate MODELDIFF through three case studies, comparing models trained with/without data augmentation, with/without pre-training, and with different SGD hyperparameters. Our code is available at https://github.com/Madry Lab/ modeldiff.
Researcher Affiliation Academia 1Massachusetts Institute of Technology. Correspondence to: Harshay Shah <harshay@mit.edu>.
Pseudocode No The paper describes its framework visually in Figure 2 and textually in Section 3, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code Yes Our code is available at https://github.com/Madry Lab/ modeldiff.
Open Datasets Yes Living17. The Living17 dataset (Santurkar et al., 2021) is an Image Net-derived dataset... Waterbirds. The Waterbirds dataset (Sagawa et al., 2020) consists of bird images taken from the CUB dataset (Wah et al., 2011) and pasted on backgrounds from the Places dataset (Zhou et al., 2017). CIFAR-10. We consider the standard CIFAR-10 (Krizhevsky, 2009) image classification dataset...
Dataset Splits Yes Estimating these datamodels entail three design choices: ... Sample size for datamodel estimation: ... we make a 90 10% train-validation split. For model selection, we choose the model checkpoint that has the maximum average accuracy on the validation dataset.
Hardware Specification Yes We train our models on a cluster of machines, each with 9 NVIDIA A100 or V100 GPUs and 96 CPU cores.
Software Dependencies No The paper mentions software like FFCV and fast-l1, but does not provide specific version numbers for these or other key software components, which is necessary for reproducibility. For example: 'We use FFCV (Leclerc et al., 2022)', 'we use the fast-l1 package a SAGA-based GPU solver for ℓ1-regularized regression'.
Experiment Setup Yes We train models for 25 epochs using SGD with the following configuration: initial learning rate 0.6, batch size 1024, cyclic learning rate schedule (with peak at epoch 12), momentum 0.9, weight decay 0.0005, and label smoothing (with smoothing hyperparameter 0.1).