Boosting Contrastive Learning with Relation Knowledge Distillation

Authors: Kai Zheng, Yuanjiang Wang, Ye Yuan3508-3516

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we demonstrate the performance of Re KD by a standard linear evaluation protocol compared with mainstream SSL and SSKD methods. Representation Training Setting In experiments, we validate our algorithm on multiple backbones: Alex Net, Mobile Net-V3, Shuffle Net-V2, Efficient Net-b0 and Res Net-18. To enable a fair comparison, we replace the last classifier layer with an MLP layer (two linear layers and one Re LU layer). The dimension of the last linear layer sets to 128.
Researcher Affiliation Industry Kai Zheng, Yuanjiang Wang*, Ye Yuan Megvii Technology {zhengkai, wangyuanjiang, yuanye}@megvii.com
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code No Code will be made available.
Open Datasets Yes We select Mo Cov2 as our unsupervised feature extractor and extract all the features with different backbones (Alex Net and Res Net-50) from images in Image Net,
Dataset Splits No The paper does not explicitly state the train/validation/test splits with percentages or counts. It mentions using a "standard linear evaluation protocol" and aligning hyperparameters with (Chen et al. 2020b), but details of the data splitting are not provided within the paper.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts) used for running experiments were provided in the paper.
Software Dependencies No The paper mentions 'GPU k-means implementation in faiss (Johnson, Douze, and J egou 2019)' but does not provide specific version numbers for faiss or any other software dependencies.
Experiment Setup Yes Representation Training Setting In experiments, we validate our algorithm on multiple backbones: Alex Net, Mobile Net-V3, Shuffle Net-V2, Efficient Net-b0 and Res Net-18. To enable a fair comparison, we replace the last classifier layer with an MLP layer (two linear layers and one Re LU layer). The dimension of the last linear layer sets to 128. For efficient clustering, we adopt the GPU k-means implementation in faiss (Johnson, Douze, and J egou 2019). M sets to 1000 as default to model the dataset s semantic distribution (ablation of M refers to appendix).