reproducibilityindex.ai

Combining Diverse Feature Priors

Authors: Saachi Jain, Dimitris Tsipras, Aleksander Madry

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we explore the design space of leveraging such feature priors by viewing them as distinct perspectives on the data. Speciﬁcally, we ﬁnd that models trained with diverse sets of feature priors have less overlapping failure modes, and can thus be combined more effectively. Moreover, we demonstrate that jointly training such models on additional (unlabeled) data allows them to correct each other s mistakes, which, in turn, leads to better generalization and resilience to spurious correlations. 2
Researcher Affiliation	Academia	Saachi Jain * 1 Dimitris Tsipras * 1 Aleksander M adry 1 1MIT. Correspondence to: Saachi Jain <saachij@mit.edu>, Dimitris Tsipras <tsipras@mit.edu>.
Pseudocode	Yes	Algorithm 1 Self-Training... Algorithm 2 Standard Co-Training
Open Source Code	Yes	Code available at https://github.com/Madry Lab/copriors.
Open Datasets	Yes	We train models on a small subset (100 examples per class) of the CIFAR-10 (Krizhevsky, 2009) and STL10 (Coates et al., 2011) datasets... We also create two datasets that each contain a different spurious correlation. Tinted STL-10... Biased Celeb A (Liu et al., 2015).
Dataset Splits	Yes	Speciﬁcally, we treat a small fraction of the training set as labeled examples (100 examples per class), another fraction as our validation set for tuning hyperparameters (10% of the total training examples), and the rest as unlabeled data.
Hardware Specification	Yes	All our experiments are performed using our internal cluster which mainly consists of NVIDIA 1080 Ti GTX GPUs.
Software Dependencies	No	The paper does not explicitly list specific software dependencies with version numbers (e.g., Python version, specific deep learning framework like PyTorch/TensorFlow with versions).
Experiment Setup	Yes	We train all our models using stochastic gradient descent (SGD) with momentum (a coefﬁcient of 0.9) and a decaying learning rate. We add weight decay regularization with a coefﬁcient of 10 4. In terms of data augmentation, we apply random cropping with a padding of 4 pixels, random horizontal ﬂips, and a random rotation of 2 degrees. ... We train all models with a batch size of 64 for 96 96-sized images and 128 for 32 32-sized images for a total of 300 epochs. ... The parameters chosen are shown in Table 11.