Train a One-Million-Way Instance Classifier for Unsupervised Visual Representation Learning
Authors: Yu Liu, Lianghua Huang, Pan Pan, Bin Wang, Yinghui Xu, Rong Jin8706-8714
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method under Image Net linear evaluation protocol and on several downstream tasks related to detection or fine-grained classification. |
| Researcher Affiliation | Industry | Yu Liu, Lianghua Huang, Pan Pan, Bin Wang, Yinghui Xu, Rong Jin Machine Intelligence Technology Lab, Alibaba Group {ly103369, xuangen.hlh, panpan.pp, ganfu.wb, renji.xyh, jinrong.jr}@alibaba-inc.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | Yes | Unless specified, we use Image Net-1K to train our unsupervised model for most experiments. Image Net-1K consists of around 1.28 million images belonging to 1000 classes. |
| Dataset Splits | Yes | Semi-supervised Learning performance on Image Net-1K, where methods are required to classify images in the val set when only a small fraction (i.e., 1% or 10%) of manual labels are provided in the train set. |
| Hardware Specification | Yes | All experiments are conducted on 64 V100 GPUs with 32GB memory. |
| Software Dependencies | No | The paper mentions using Res Net-50 and SGD optimizer but does not specify software dependencies with version numbers (e.g., PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | We use Res Net-50 (He et al. 2016) as the backbone in all our experiments. We train our model using the SGD optimizer, where the weight decay and momentum are set to 0.0001 and 0.9, respectively. The initial learning rate (lr) is set to 0.48 and decays using the cosine annealing scheduler. In addition, we use 10 epochs of linear lr warmup to stabilize training. The minibatch size is 4096 and the feature dimension D = 128. We set the temperature in Eq. (1) as τ = 0.15, and the smoothing factor in Eq. (3) as α = 0.2. |