Generalization Bounds via Conditional $f$-Information
Authors: Ziqiao Wang, Yongyi Mao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct an empirical comparison between our novel conditional f-information generalization bounds and several existing information-theoretic generalization bounds. Our experimental setup closely aligns with the settings in [24]. In particular, we undertake two distinct prediction tasks as follows: 1) Linear Classifier on Synthetic Gaussian Dataset; 2) CNN and Res Net-50 on Real-World Datasets. |
| Researcher Affiliation | Academia | Ziqiao Wang School of Computer Science and Technology Tongji University Shanghai, China ziqiaowang@tongji.edu.cn Yongyi Mao School of Electrical Engineering and Computer Science University of Ottawa Ottawa, Canada ymao@uottawa.ca |
| Pseudocode | No | No pseudocode or algorithm block was found in the paper. |
| Open Source Code | Yes | Our experimental setup largely follows [24], and the code of our experiments can be found at https://github.com/Ziqiao Wang Geothe/Conditional-f-Information-Bound. |
| Open Datasets | Yes | 1) Linear Classifier on Synthetic Gaussian Dataset... 2) CNN and Res Net-50 on Real-World Datasets... training a 4-layer CNN on a binary MNIST dataset ( 4 vs 9 ) [50] and fine-tuning a Res Net-50 model [51], pretrained on Image Net [52], on CIFAR10 [53]. |
| Dataset Splits | No | For the linear classifier on the two-class data (Figure 2a)... We train the linear classifier using full-batch gradient descent with a fixed learning rate of 0.01 for a total of 300 epochs, employing early stopping when the training error falls below a threshold (e.g., < 0.5%). The text does not explicitly mention validation sets or splits. |
| Hardware Specification | Yes | All these experiments are conducted using NVIDIA A100 GPUs with 40 GB of memory. |
| Software Dependencies | No | In this experiment, similar to [24], we use the popular Python package scikit-learn [63] to generate synthetic Gaussian data... The 4-layer CNN model is trained using the Adam optimizer... No specific version numbers for these software components are provided. |
| Experiment Setup | Yes | We train the linear classifier using full-batch gradient descent with a fixed learning rate of 0.01 for a total of 300 epochs... The 4-layer CNN model is trained using the Adam optimizer with a learning rate of 0.001 and a momentum coefficient of β1 = 0.9. The training process spans 200 epochs with a batch size of 128. For Res Net-50 on CIFAR10... The Res Net model is trained using stochastic gradient descent (SGD) with a learning rate of 0.01 and a momentum coefficient of 0.9 for a total of 40 epochs. The batch size for this experiment is set to 64. |