On the Trade-off of Intra-/Inter-class Diversity for Supervised Pre-training

Authors: Jieyu Zhang, Bohan Wang, Zhengyu Hu, Pang Wei W. Koh, Alexander J. Ratner

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, with Image Net [24] as the pre-training dataset and the pre-training dataset size fixed, we show that the optimal performance on the downstream tasks occurs when a balance on the intra-/interclass diversity is achieved. We demonstrate the effectiveness of this application by an improvement of approximately 2 points on average on downstream tasks when pre-training on Image Net. We visualize the results in Figure 1. In the contour plot, the z-value is the error rate on the test set, thus lower is better.
Researcher Affiliation Collaboration Jieyu Zhang1 , Bohan Wang2 , Zhengyu Hu3, Pang Wei Koh1, Alexander Ratner1,4 1 University of Washington 2 USTC 3 HKUST(GZ) 4 Snorkel AI, Inc.
Pseudocode No The paper describes data generation processes and theoretical proofs but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper states 'We build our code on Python and Pytorch' but does not provide any link or explicit statement about making their source code publicly available for the described methodology.
Open Datasets Yes Following common practice [13], we use the Image Net [24] as the dataset for supervised pre-training. Image Net [4]. It is an image dataset organized according to the Word Net hierarchy.
Dataset Splits Yes For evaluating the performance of the pre-trained model on downstream tasks, we perform linear probing (tuning the head but freezing the lower layers). We repeat each individual experiment five times and report the averaged top-1 accuracy. The validation set has 50 images per class and the test set has 900 images per class. (for Places365 dataset)
Hardware Specification Yes All experiments ran on a machine with an Intel(R) Xeon(R) CPU E5-2678 v3 with 512G memory and two 48G NVIDIA RTX A6000 GPUs.
Software Dependencies No The paper states 'We build our code on Python and Pytorch.' but does not specify version numbers for these software dependencies, which would be necessary for exact reproduction.
Experiment Setup Yes For pre-training, we set the number of epochs to be 100 and the batch size to be 64. We use the Adam optimizer for training with a learning rate of 0.1, a momentum of 0.9, and a weight decay of 1e-4. We repeat each experiment 3 times with different seeds and report the mean and variance of the results.