reproducibilityindex.ai

Distilling Localization for Self-Supervised Representation Learning

Authors: Nanxuan Zhao, Zhirong Wu, Rynson W.H. Lau, Stephen Lin10990-10998

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a series of experiments on model designs for self-supervised representation learning and their transfer learning abilities. Ablation Study In this section, we ﬁrst validate our data-driven approach of distilling localization through a series of ablation experiments for image classiﬁcation on Image Net.
Researcher Affiliation	Collaboration	Nanxuan Zhao,1 Zhirong Wu,2 Rynson W.H. Lau,1 Stephen Lin2 1 City University of Hong Kong, 2 Microsoft Research Asia
Pseudocode	No	The paper describes the methods conceptually and mathematically (Eqn 1, 2) but does not include a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	Image Net classiﬁcation, object detection on PASCAL VOC and MSCOCO, DUTS dataset (Wang et al. 2017) from scratch with 10,053 training images, DUT-OMRON dataset (Yang et al. 2013)
Dataset Splits	Yes	The performance is measured on the Image Net validation set of 1000 classes, and evaluated by linear classiﬁers. ... All models are trained using the Res Net50 architecture and reported on the Image Net validation set. ... We transfer our pretrained model to object detection by ﬁnetuning it on PASCAL VOC 2007+2012 trainval and evaluating on the VOC 2007 test set.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or cloud computing instances.
Software Dependencies	No	The paper mentions 'Detectron2 codebase' but does not specify any version numbers for Detectron2 or any other software dependencies.
Experiment Setup	Yes	Speciﬁcally, we use a temperature τ = 0.07 in Eqn. 1, and an embedding dimension of D = 128 for each image. A memory queue (He et al. 2019) of size k = 65536 negatives is used to accelerate discrimination. Training takes 200 epochs with an initial learning rate of 0.03 that is decayed 1/10 at epochs 120 and 160. All models are trained using the Res Net50 architecture and reported on the Image Net validation set. Performance is evaluated by the linear readoff on the penultimate layer features. The optimization takes 100 epochs and starts with a learning rate of 30 that is decayed every 30 epochs.