reproducibilityindex.ai

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

Authors: Haizhou Shi, Youcai Zhang, Siliang Tang, Wenjie Zhu, Yaqian Li, Yandong Guo, Yueting Zhuang2225-2234

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ﬁrst evaluate the representation spaces of the small models and make two non-negligible observations: (i) the small models can complete the pretext task without overﬁtting despite their limited capacity and (ii) they universally suffer the problem of over clustering. Then we verify multiple assumptions that are considered to alleviate the over-clustering phenomenon. Finally, we combine the validated techniques and improve the baseline performances of ﬁve small architectures with considerable margins, which indicates that training small self-supervised contrastive models is feasible even without distillation signals.
Researcher Affiliation	Collaboration	1 OPPO Research Institute 2 Zhejiang University 3 New York University
Pseudocode	No	The paper describes methods and uses mathematical equations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/WOWNICE/ssl-small.
Open Datasets	Yes	All the metrics are evaluated on the penultimate output of the networks (refer to Tab.2) and the Image Net1k dataset (Deng et al. 2009). ... We benchmark the transferability of the backbone networks on CIFAR10, CIFAR100 (Krizhevsky 2009), and Caltech101 (Fei-Fei, Fergus, and Perona 2004) image classiﬁcation datasets.
Dataset Splits	Yes	We sample 50 images per class for both the training set and the validation sets, making it a 50, 000-way classiﬁcation task for the pre-trained models. ... For the small models, there is no overﬁtting problem when trained on the pretext task. This conclusion is supported by the fact that each model s metrics have no signiﬁcant difference on both the training and validation sets.
Hardware Specification	Yes	The training times are evaluated on a single 8-card V100 GPU server for 200 epochs of training.
Software Dependencies	No	The paper mentions basing research on the "Mo Co V2 algorithm" but does not specify any software libraries or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	To better utilize the computational resource, we set the batch size as 1024, and the learning rate as 0.06. ... We set temperature τ = 0.1, batch size B = 512. learning rate η = 0.06, and negative sample size K = 65536. ... We train all the models for 800 epochs with cosine decay, and evaluate them at epoch 200 and epoch 800.