Slimmed Asymmetrical Contrastive Learning and Cross Distillation for Lightweight Model Training
Authors: Jian Meng, Li Yang, Kyungmin Lee, Jinwoo Shin, Deliang Fan, Jae-sun Seo
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Compared to the So TA lightweight CL training (distillation) algorithms, SACL-XD achieves 1.79% Image Net-1K accuracy improvement on Mobile Net-V3 with 64 training FLOPs reduction. Code is available at https://github.com/mengjian0502/SACL-XD. Table 1: Image Net-1k test accuracy with linear evaluation protocal based on Mobile Net-V3 [22] trained by different contrastive learning/distillation methods. |
| Researcher Affiliation | Academia | Jian Meng , Li Yang , Kyungmin Lee , Jinwoo Shin , Deliang Fan , and Jae-sun Seo Cornell Tech, USA University of North Carolina at Charlotte, USA KAIST, South Korea John Hopkins University, USA {kyungmnlee, jinwoos}@kaist.ac.kr lyang50@uncc.edu dfan10@jhu.edu {jm2787, js3528}@cornell.edu |
| Pseudocode | Yes | Algorithm 1: Py Torch-style pseudocode for the proposed algorithm |
| Open Source Code | Yes | Code is available at https://github.com/mengjian0502/SACL-XD. |
| Open Datasets | Yes | We evaluate the performance of the proposed algorithm based on CNN encoders (Mobile Net [23, 22], Efficient Net [29], Res Net [20]) and Vi T [13] models on the Image Net-1K and Image Net-100 dataset. We also demonstrate the capability of the proposed method with tiny-sized Res Net on the small CIFAR dataset. (Table 8, 9, 10 detail augmentation for Image Net-1K, Image Net-100, CIFAR-10 respectively) |
| Dataset Splits | Yes | We follow the linear evaluation protocol on Image Net to evaluate the performance of the backbone trained by the proposed SACL and cross-distillation (XD) algorithm. We follow the data augmentation setup in [12] for the CIFAR-10 dataset. |
| Hardware Specification | Yes | Table 11: Training time comparison between the proposed method and the distillation-based CL... GPU Type A100 (80G) |
| Software Dependencies | No | The paper mentions 'Py Torch-style pseudocode' and uses 'LARS optimizer' but does not specify version numbers for PyTorch, Python, or other key software libraries. |
| Experiment Setup | Yes | Appendix A.3 Detailed Experimental Setup of Pre-training: The encoders (Mobile Net, Efficient Net, Res Net-50) are trained on Image Net-1K with 100/200/300 epochs from scratch with the proposed method. We set the batch to 256 with a learning rate = 0.8. We employ the LARS optimizer with weight decay set to 1.5e-6. We set the correlation weights λ to 0.005. The hidden layer dimension of the projector is 4096. The detailed data augmentation is summarized in Table 8 |