On Leave-One-Out Conditional Mutual Information For Generalization
Authors: Mohamad Rida Rammal, Alessandro Achille, Aditya Golatkar, Suhas Diggavi, Stefano Soatto
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning. In particular, our bounds are non-vacuous on image-classification tasks. ... Empirically, we show that our bound can be computed and is non-vacuous for state-of-the-art deep networks fine-tuned on standard image classification tasks (Table 1). We also study the dependency of the bound on the size of the dataset (Figure 2), and the hyper-parameters (Figure 3). ... We now study the behavior of our loo-CMI and floo-CMI bounds on realworld image classification tasks. |
| Researcher Affiliation | Collaboration | Mohamad Rida Rammal University of California, Los Angeles ridarammal@g.ucla.edu Alessandro Achille Caltech, AWS AI Labs aachille@caltech.edu Aditya Golatkar University of California, Los Angeles aditya29@cs.ucla.edu Suhas Diggavi University of California, Los Angeles suhas@ee.ucla.edu Stefano Soatto University of California, Los Angeles soatto@ucla.edu |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | In particular, we fine-tune an off-the-shelf Res Net-18 model pretrained on Image Net on a set of standard image-classification tasks (see also Table 1): MIT-67 [26] and Oxford Pets [24] with a few thousand examples each. |
| Dataset Splits | No | The paper mentions 'Train Error' and 'Test Error' in Table 1 but does not specify a validation dataset split (e.g., 80/10/10 split or specific counts for training, validation, and test sets). While it states 'We compute w i/h i by removing sample i from the training set and re-training from scratch,' this refers to a leave-one-out approach within the training data, not a distinct validation split. |
| Hardware Specification | Yes | We used 2 NVIDIA 1080Ti GPUS and the experiments take approximately 1-2 days. |
| Software Dependencies | No | The paper mentions using "stochastic gradient descent" but does not list any specific software dependencies (e.g., Python, PyTorch, TensorFlow versions or other libraries) with version numbers. |
| Experiment Setup | Yes | On both datasets we fine-tune for 10 epochs using stochastic gradient descent with learning rate = 0.05, momentum m = 0.99, batch size B = 256, and weight decay λ = 0.0005. |