Generalization Bounds via Conditional $f$-Information

Authors: Ziqiao Wang, Yongyi Mao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct an empirical comparison between our novel conditional f-information generalization bounds and several existing information-theoretic generalization bounds. Our experimental setup closely aligns with the settings in [24]. In particular, we undertake two distinct prediction tasks as follows: 1) Linear Classifier on Synthetic Gaussian Dataset; 2) CNN and Res Net-50 on Real-World Datasets.
Researcher Affiliation Academia Ziqiao Wang School of Computer Science and Technology Tongji University Shanghai, China ziqiaowang@tongji.edu.cn Yongyi Mao School of Electrical Engineering and Computer Science University of Ottawa Ottawa, Canada ymao@uottawa.ca
Pseudocode No No pseudocode or algorithm block was found in the paper.
Open Source Code Yes Our experimental setup largely follows [24], and the code of our experiments can be found at https://github.com/Ziqiao Wang Geothe/Conditional-f-Information-Bound.
Open Datasets Yes 1) Linear Classifier on Synthetic Gaussian Dataset... 2) CNN and Res Net-50 on Real-World Datasets... training a 4-layer CNN on a binary MNIST dataset ( 4 vs 9 ) [50] and fine-tuning a Res Net-50 model [51], pretrained on Image Net [52], on CIFAR10 [53].
Dataset Splits No For the linear classifier on the two-class data (Figure 2a)... We train the linear classifier using full-batch gradient descent with a fixed learning rate of 0.01 for a total of 300 epochs, employing early stopping when the training error falls below a threshold (e.g., < 0.5%). The text does not explicitly mention validation sets or splits.
Hardware Specification Yes All these experiments are conducted using NVIDIA A100 GPUs with 40 GB of memory.
Software Dependencies No In this experiment, similar to [24], we use the popular Python package scikit-learn [63] to generate synthetic Gaussian data... The 4-layer CNN model is trained using the Adam optimizer... No specific version numbers for these software components are provided.
Experiment Setup Yes We train the linear classifier using full-batch gradient descent with a fixed learning rate of 0.01 for a total of 300 epochs... The 4-layer CNN model is trained using the Adam optimizer with a learning rate of 0.001 and a momentum coefficient of β1 = 0.9. The training process spans 200 epochs with a batch size of 128. For Res Net-50 on CIFAR10... The Res Net model is trained using stochastic gradient descent (SGD) with a learning rate of 0.01 and a momentum coefficient of 0.9 for a total of 40 epochs. The batch size for this experiment is set to 64.