RATT: Leveraging Unlabeled Data to Guarantee Generalization
Authors: Saurabh Garg, Sivaraman Balakrishnan, Zico Kolter, Zachary Lipton
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, on canonical computer vision and NLP tasks, our bound provides non-vacuous generalization guarantees that track actual performance closely. This work enables practitioners to certify generalization even when (labeled) holdout data is unavailable and provides insights into the relationship between random label noise and generalization. 5. Empirical Study and Implications Having established our framework theoretically, we now demonstrate its utility experimentally. |
| Researcher Affiliation | Academia | 1Machine Learning Department, Carnegie Mellon University 2Department of Statistics and Data Science, Carnegie Mellon University 3Computer Science Department, Carnegie Mellon University. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | For binary tasks, we use binarized CIFAR-10 (first 5 classes vs rest) (Krizhevsky & Hinton, 2009), binary MNIST (0-4 vs 5-9) (Le Cun et al., 1998) and IMDb sentiment analysis dataset (Maas et al., 2011). For multiclass setup, we use MNIST and CIFAR-10. |
| Dataset Splits | No | The paper mentions holding out data to simulate unlabeled data, but does not provide specific details on a validation split (e.g., percentages or counts) for model tuning or early stopping criteria. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper mentions tools like TensorFlow and PyTorch in its references, but does not specify software dependencies with version numbers for its own experimental setup. |
| Experiment Setup | Yes | We fix the amount of unlabeled data at 20% of the clean dataset size and train all models with standard hyperparameters. See App. C for exact hyperparameter values. |