On the Trade-off of Intra-/Inter-class Diversity for Supervised Pre-training
Authors: Jieyu Zhang, Bohan Wang, Zhengyu Hu, Pang Wei W. Koh, Alexander J. Ratner
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, with Image Net [24] as the pre-training dataset and the pre-training dataset size fixed, we show that the optimal performance on the downstream tasks occurs when a balance on the intra-/interclass diversity is achieved. We demonstrate the effectiveness of this application by an improvement of approximately 2 points on average on downstream tasks when pre-training on Image Net. We visualize the results in Figure 1. In the contour plot, the z-value is the error rate on the test set, thus lower is better. |
| Researcher Affiliation | Collaboration | Jieyu Zhang1 , Bohan Wang2 , Zhengyu Hu3, Pang Wei Koh1, Alexander Ratner1,4 1 University of Washington 2 USTC 3 HKUST(GZ) 4 Snorkel AI, Inc. |
| Pseudocode | No | The paper describes data generation processes and theoretical proofs but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'We build our code on Python and Pytorch' but does not provide any link or explicit statement about making their source code publicly available for the described methodology. |
| Open Datasets | Yes | Following common practice [13], we use the Image Net [24] as the dataset for supervised pre-training. Image Net [4]. It is an image dataset organized according to the Word Net hierarchy. |
| Dataset Splits | Yes | For evaluating the performance of the pre-trained model on downstream tasks, we perform linear probing (tuning the head but freezing the lower layers). We repeat each individual experiment five times and report the averaged top-1 accuracy. The validation set has 50 images per class and the test set has 900 images per class. (for Places365 dataset) |
| Hardware Specification | Yes | All experiments ran on a machine with an Intel(R) Xeon(R) CPU E5-2678 v3 with 512G memory and two 48G NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper states 'We build our code on Python and Pytorch.' but does not specify version numbers for these software dependencies, which would be necessary for exact reproduction. |
| Experiment Setup | Yes | For pre-training, we set the number of epochs to be 100 and the batch size to be 64. We use the Adam optimizer for training with a learning rate of 0.1, a momentum of 0.9, and a weight decay of 1e-4. We repeat each experiment 3 times with different seeds and report the mean and variance of the results. |