reproducibilityindex.ai

Slimmed Asymmetrical Contrastive Learning and Cross Distillation for Lightweight Model Training

Authors: Jian Meng, Li Yang, Kyungmin Lee, Jinwoo Shin, Deliang Fan, Jae-sun Seo

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Compared to the So TA lightweight CL training (distillation) algorithms, SACL-XD achieves 1.79% Image Net-1K accuracy improvement on Mobile Net-V3 with 64 training FLOPs reduction. Code is available at https://github.com/mengjian0502/SACL-XD. Table 1: Image Net-1k test accuracy with linear evaluation protocal based on Mobile Net-V3 [22] trained by different contrastive learning/distillation methods.
Researcher Affiliation	Academia	Jian Meng , Li Yang , Kyungmin Lee , Jinwoo Shin , Deliang Fan , and Jae-sun Seo Cornell Tech, USA University of North Carolina at Charlotte, USA KAIST, South Korea John Hopkins University, USA {kyungmnlee, jinwoos}@kaist.ac.kr lyang50@uncc.edu dfan10@jhu.edu {jm2787, js3528}@cornell.edu
Pseudocode	Yes	Algorithm 1: Py Torch-style pseudocode for the proposed algorithm
Open Source Code	Yes	Code is available at https://github.com/mengjian0502/SACL-XD.
Open Datasets	Yes	We evaluate the performance of the proposed algorithm based on CNN encoders (Mobile Net [23, 22], Efficient Net [29], Res Net [20]) and Vi T [13] models on the Image Net-1K and Image Net-100 dataset. We also demonstrate the capability of the proposed method with tiny-sized Res Net on the small CIFAR dataset. (Table 8, 9, 10 detail augmentation for Image Net-1K, Image Net-100, CIFAR-10 respectively)
Dataset Splits	Yes	We follow the linear evaluation protocol on Image Net to evaluate the performance of the backbone trained by the proposed SACL and cross-distillation (XD) algorithm. We follow the data augmentation setup in [12] for the CIFAR-10 dataset.
Hardware Specification	Yes	Table 11: Training time comparison between the proposed method and the distillation-based CL... GPU Type A100 (80G)
Software Dependencies	No	The paper mentions 'Py Torch-style pseudocode' and uses 'LARS optimizer' but does not specify version numbers for PyTorch, Python, or other key software libraries.
Experiment Setup	Yes	Appendix A.3 Detailed Experimental Setup of Pre-training: The encoders (Mobile Net, Efficient Net, Res Net-50) are trained on Image Net-1K with 100/200/300 epochs from scratch with the proposed method. We set the batch to 256 with a learning rate = 0.8. We employ the LARS optimizer with weight decay set to 1.5e-6. We set the correlation weights λ to 0.005. The hidden layer dimension of the projector is 4096. The detailed data augmentation is summarized in Table 8