Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics

Authors: Shoaib Ahmed Siddiqui, Nitarshan Rajkumar, Tegan Maharaj, David Krueger, Sara Hooker

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the following sections, we perform experiments across 6 datasets: CIFAR-10/100, Image Net, Waterbirds, Celeb A, and Clothing1M. For details regarding the experimental setup, see Appendix A. We first evaluate convergence dynamics of different probe suites (Section 3.1), validating the approach of MAP-D. We then qualitatively demonstrate the ability to audit datasets using MAP-D (Section 3.2), and evaluate performance on a variety of downstream tasks: noise correction (Section 3.3), prioritizing points for training (Section 3.4), and identifying minority-group samples (Section 3.5).
Researcher Affiliation Collaboration Shoaib Ahmed Siddiqui University of Cambridge msas3@cam.ac.uk Nitarshan Rajkumar University of Cambridge nr500@cam.ac.uk Tegan Maharaj University of Toronto tegan.maharaj@utoronto.ca David Krueger University of Cambridge dsk30@cam.ac.uk Sara Hooker Cohere for AI sarahooker@cohere.com
Pseudocode No The paper does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code No The paper does not include an explicit statement about releasing source code for the methodology or a link to a code repository.
Open Datasets Yes We present consistent results across six image classification datasets, CIFAR-10/CIFAR-100 (Krizhevsky et al., 2009), Image Net (Deng et al., 2009), Waterbirds (Sagawa et al., 2020), Celeb A (Liu et al., 2015) , Clothing1M (Xiao et al., 2015) and two models from the Res Net family (He et al., 2016).
Dataset Splits Yes We curate 250 training examples for each probe category. For categories other than Typical/Atypical, we sample examples at random and then apply the corresponding transformations. We also curate 250 test examples for each probe category to evaluate the accuracy of our nearest neighbor assignment of metadata to unseen data points, where we know the true underlying metadata.
Hardware Specification No The paper thanks the SDS department at DFKI Kaiserslautern for "support with computing resources" but does not specify any particular GPU or CPU models, memory, or other hardware details.
Software Dependencies No The paper mentions "Py Torch (Paszke et al., 2019)" but does not specify a version number for PyTorch or other key software dependencies.
Experiment Setup Yes In all experiments, we use variants of the Res Net architecture and leverage standard image classification datasets CIFAR-10/100 and Image Net. We train with SGD using standard hyperparameter settings: learning rate 0.1, momentum 0.9, weight-decay 0.0005, and a cosine learning rate decay. We achieve top-1 accuracies of 93.68% on CIFAR-10, 72.80% on CIFAR-100, and 73.94% on Image Net.