Parametric Instance Classification for Unsupervised Visual Feature learning

Authors: Yue Cao, Zhenda Xie, Bin Liu, Yutong Lin, Zheng Zhang, Han Hu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform unsupervised feature pre-training on the most widely-used dataset, Image Net-1K [8], which have 1.28 million training images. For Image Net-1K, we vary the training lengths from 200 epochs to 1600 epochs2 to facilitate comparison with previous reported results. In all experiments, a Res Net-50 [16, 17] model is adopted as the backbone network. Eight GPUs of Titan V100 and a total batch size of 512 are adopted. We follow the similar augmentations and training settings as [5, 6], with details shown in Appendix C. For the cosine soft-max loss (1), we find out that τ = 0.2 could generally perform well thus we adopt it for all experiments.
Researcher Affiliation Collaboration Yue Cao 1, Zhenda Xie 12, Bin Liu 12, Yutong Lin13, Zheng Zhang1, Han Hu1 1Microsoft Research Asia 2Tsinghua University 3Xi an Jiaotong University {yuecao,t-zhxie,v-liubin,v-yutlin,zhez,hanhu}@microsoft.com
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks. It describes the methodology in text and with a diagram (Figure 1).
Open Source Code Yes The code and network configurations are available at https://github.com/bl0/PIC.
Open Datasets Yes We perform unsupervised feature pre-training on the most widely-used dataset, Image Net-1K [8], which have 1.28 million training images.
Dataset Splits Yes The linear evaluation protocol [18, 15, 5] on the Image Net-1k dataset is used in ablations.
Hardware Specification Yes Eight GPUs of Titan V100 and a total batch size of 512 are adopted.
Software Dependencies No The paper mentions ResNet-50 models and ImageNet-1K, but it does not specify software dependencies like Python, PyTorch, TensorFlow versions, or other libraries with their version numbers.
Experiment Setup Yes We perform unsupervised feature pre-training on the most widely-used dataset, Image Net-1K [8], which have 1.28 million training images. For Image Net-1K, we vary the training lengths from 200 epochs to 1600 epochs2 to facilitate comparison with previous reported results. In all experiments, a Res Net-50 [16, 17] model is adopted as the backbone network. Eight GPUs of Titan V100 and a total batch size of 512 are adopted. We follow the similar augmentations and training settings as [5, 6], with details shown in Appendix C. For the cosine soft-max loss (1), we find out that τ = 0.2 could generally perform well thus we adopt it for all experiments.