Distilling Localization for Self-Supervised Representation Learning
Authors: Nanxuan Zhao, Zhirong Wu, Rynson W.H. Lau, Stephen Lin10990-10998
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a series of experiments on model designs for self-supervised representation learning and their transfer learning abilities. Ablation Study In this section, we first validate our data-driven approach of distilling localization through a series of ablation experiments for image classification on Image Net. |
| Researcher Affiliation | Collaboration | Nanxuan Zhao,1 Zhirong Wu,2 Rynson W.H. Lau,1 Stephen Lin2 1 City University of Hong Kong, 2 Microsoft Research Asia |
| Pseudocode | No | The paper describes the methods conceptually and mathematically (Eqn 1, 2) but does not include a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | Image Net classification, object detection on PASCAL VOC and MSCOCO, DUTS dataset (Wang et al. 2017) from scratch with 10,053 training images, DUT-OMRON dataset (Yang et al. 2013) |
| Dataset Splits | Yes | The performance is measured on the Image Net validation set of 1000 classes, and evaluated by linear classifiers. ... All models are trained using the Res Net50 architecture and reported on the Image Net validation set. ... We transfer our pretrained model to object detection by finetuning it on PASCAL VOC 2007+2012 trainval and evaluating on the VOC 2007 test set. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or cloud computing instances. |
| Software Dependencies | No | The paper mentions 'Detectron2 codebase' but does not specify any version numbers for Detectron2 or any other software dependencies. |
| Experiment Setup | Yes | Specifically, we use a temperature τ = 0.07 in Eqn. 1, and an embedding dimension of D = 128 for each image. A memory queue (He et al. 2019) of size k = 65536 negatives is used to accelerate discrimination. Training takes 200 epochs with an initial learning rate of 0.03 that is decayed 1/10 at epochs 120 and 160. All models are trained using the Res Net50 architecture and reported on the Image Net validation set. Performance is evaluated by the linear readoff on the penultimate layer features. The optimization takes 100 epochs and starts with a learning rate of 30 that is decayed every 30 epochs. |