reproducibilityindex.ai

RATT: Leveraging Unlabeled Data to Guarantee Generalization

Authors: Saurabh Garg, Sivaraman Balakrishnan, Zico Kolter, Zachary Lipton

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, on canonical computer vision and NLP tasks, our bound provides non-vacuous generalization guarantees that track actual performance closely. This work enables practitioners to certify generalization even when (labeled) holdout data is unavailable and provides insights into the relationship between random label noise and generalization. 5. Empirical Study and Implications Having established our framework theoretically, we now demonstrate its utility experimentally.
Researcher Affiliation	Academia	1Machine Learning Department, Carnegie Mellon University 2Department of Statistics and Data Science, Carnegie Mellon University 3Computer Science Department, Carnegie Mellon University.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	For binary tasks, we use binarized CIFAR-10 (ﬁrst 5 classes vs rest) (Krizhevsky & Hinton, 2009), binary MNIST (0-4 vs 5-9) (Le Cun et al., 1998) and IMDb sentiment analysis dataset (Maas et al., 2011). For multiclass setup, we use MNIST and CIFAR-10.
Dataset Splits	No	The paper mentions holding out data to simulate unlabeled data, but does not provide specific details on a validation split (e.g., percentages or counts) for model tuning or early stopping criteria.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper mentions tools like TensorFlow and PyTorch in its references, but does not specify software dependencies with version numbers for its own experimental setup.
Experiment Setup	Yes	We ﬁx the amount of unlabeled data at 20% of the clean dataset size and train all models with standard hyperparameters. See App. C for exact hyperparameter values.