Towards Generalization beyond Pointwise Learning: A Unified Information-theoretic Perspective
Authors: Yuxin Dong, Tieliang Gong, Hong Chen, Zhongjiang He, Mengxiang Li, Shuangyong Song, Chen Li
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive numerical studies then demonstrate the effectiveness of our bounds in capturing the generalization dynamics across diverse learning scenarios. |
| Researcher Affiliation | Collaboration | 1School of Computer Science and Technology, Xi an Jiaotong University 2College of Science, Huazhong Agriculture University 3China Telecom Corporation Limited. |
| Pseudocode | No | The paper does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The source code is available at https://github.com/Yuxin-Dong/Pairwise. |
| Open Datasets | Yes | Our initial experiment encompasses a 5-class classification task, employing a simple MLP network trained on synthetic Gaussian datasets... we first train a 4-layer CNN on a binarized version of the MNIST dataset... Subsequently, we fine-tune a pretrained Res Net-50 network on the CIFAR-10 dataset... Additionally examine fine-tuning a CLIP (Vi T-B/32) model (Radford et al., 2021) on the Flickr30k dataset. |
| Dataset Splits | No | The paper mentions using specific datasets (synthetic Gaussian, MNIST, CIFAR-10, Flickr30k) and following experimental settings from other papers, but it does not explicitly state the training, validation, or test dataset splits (e.g., percentages or counts) within the paper itself. |
| Hardware Specification | Yes | The deep learning models are trained with an Intel Xeon CPU (2.10GHz, 48 cores), 256GB memory, and 4 Nvidia Tesla V100 GPUs (32GB). |
| Software Dependencies | No | The paper mentions using the 'scikit-learn Python package' but does not specify its version number or any other software dependencies with version numbers. |
| Experiment Setup | Yes | Our initial experiment encompasses a 5-class classification task, employing a simple 4-layer MLP network, employing Re LU as the activation function. The selection of the loss function is contingent on the value of m: for m = 1, we utilized the binary 0-1 loss to quantify the generalization gap; for m > 1, we implemented a binarized version of the corresponding contrastive losses. Specifically, with a predictive function f : X m R, the losses are computed based on a given threshold θ, exemplified in the pairwise contrastive loss as follows: Lij = 1f(Xi,Xj) θ 1Yi=Yj. Here, the threshold θ was adaptively selected to balance precision and recall scores. |