ModelDiff: A Framework for Comparing Learning Algorithms
Authors: Harshay Shah, Sung Min Park, Andrew Ilyas, Aleksander Madry
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate MODELDIFF through three case studies, comparing models trained with/without data augmentation, with/without pre-training, and with different SGD hyperparameters. Our code is available at https://github.com/Madry Lab/ modeldiff. |
| Researcher Affiliation | Academia | 1Massachusetts Institute of Technology. Correspondence to: Harshay Shah <harshay@mit.edu>. |
| Pseudocode | No | The paper describes its framework visually in Figure 2 and textually in Section 3, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block with structured steps. |
| Open Source Code | Yes | Our code is available at https://github.com/Madry Lab/ modeldiff. |
| Open Datasets | Yes | Living17. The Living17 dataset (Santurkar et al., 2021) is an Image Net-derived dataset... Waterbirds. The Waterbirds dataset (Sagawa et al., 2020) consists of bird images taken from the CUB dataset (Wah et al., 2011) and pasted on backgrounds from the Places dataset (Zhou et al., 2017). CIFAR-10. We consider the standard CIFAR-10 (Krizhevsky, 2009) image classification dataset... |
| Dataset Splits | Yes | Estimating these datamodels entail three design choices: ... Sample size for datamodel estimation: ... we make a 90 10% train-validation split. For model selection, we choose the model checkpoint that has the maximum average accuracy on the validation dataset. |
| Hardware Specification | Yes | We train our models on a cluster of machines, each with 9 NVIDIA A100 or V100 GPUs and 96 CPU cores. |
| Software Dependencies | No | The paper mentions software like FFCV and fast-l1, but does not provide specific version numbers for these or other key software components, which is necessary for reproducibility. For example: 'We use FFCV (Leclerc et al., 2022)', 'we use the fast-l1 package a SAGA-based GPU solver for ℓ1-regularized regression'. |
| Experiment Setup | Yes | We train models for 25 epochs using SGD with the following configuration: initial learning rate 0.6, batch size 1024, cyclic learning rate schedule (with peak at epoch 12), momentum 0.9, weight decay 0.0005, and label smoothing (with smoothing hyperparameter 0.1). |