reproducibilityindex.ai

Generalization Bounds via Conditional $f$-Information

Authors: Ziqiao Wang, Yongyi Mao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct an empirical comparison between our novel conditional f-information generalization bounds and several existing information-theoretic generalization bounds. Our experimental setup closely aligns with the settings in [24]. In particular, we undertake two distinct prediction tasks as follows: 1) Linear Classifier on Synthetic Gaussian Dataset; 2) CNN and Res Net-50 on Real-World Datasets.
Researcher Affiliation	Academia	Ziqiao Wang School of Computer Science and Technology Tongji University Shanghai, China ziqiaowang@tongji.edu.cn Yongyi Mao School of Electrical Engineering and Computer Science University of Ottawa Ottawa, Canada ymao@uottawa.ca
Pseudocode	No	No pseudocode or algorithm block was found in the paper.
Open Source Code	Yes	Our experimental setup largely follows [24], and the code of our experiments can be found at https://github.com/Ziqiao Wang Geothe/Conditional-f-Information-Bound.
Open Datasets	Yes	1) Linear Classifier on Synthetic Gaussian Dataset... 2) CNN and Res Net-50 on Real-World Datasets... training a 4-layer CNN on a binary MNIST dataset ( 4 vs 9 ) [50] and fine-tuning a Res Net-50 model [51], pretrained on Image Net [52], on CIFAR10 [53].
Dataset Splits	No	For the linear classifier on the two-class data (Figure 2a)... We train the linear classifier using full-batch gradient descent with a fixed learning rate of 0.01 for a total of 300 epochs, employing early stopping when the training error falls below a threshold (e.g., < 0.5%). The text does not explicitly mention validation sets or splits.
Hardware Specification	Yes	All these experiments are conducted using NVIDIA A100 GPUs with 40 GB of memory.
Software Dependencies	No	In this experiment, similar to [24], we use the popular Python package scikit-learn [63] to generate synthetic Gaussian data... The 4-layer CNN model is trained using the Adam optimizer... No specific version numbers for these software components are provided.
Experiment Setup	Yes	We train the linear classifier using full-batch gradient descent with a fixed learning rate of 0.01 for a total of 300 epochs... The 4-layer CNN model is trained using the Adam optimizer with a learning rate of 0.001 and a momentum coefficient of β1 = 0.9. The training process spans 200 epochs with a batch size of 128. For Res Net-50 on CIFAR10... The Res Net model is trained using stochastic gradient descent (SGD) with a learning rate of 0.01 and a momentum coefficient of 0.9 for a total of 40 epochs. The batch size for this experiment is set to 64.