Visual Recognition with Deep Nearest Centroids

Authors: Wenguan Wang, Cheng Han, Tianfei Zhou, Dongfang Liu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Compared with parametric counterparts, DNC performs better on image classification (CIFAR-10, CIFAR-100, Image Net) and greatly boosts pixel recognition (ADE20K, Cityscapes) with improved transparency, using various backbone network architectures (Res Net, Swin) and segmentation models (FCN, Deep Lab V3, Swin). Our code is available at DNC. 1 INTRODUCTION Deep learning models, from convolutional networks (e.g., VGG [1], Res Net [2]) to Transformer-based architectures (e.g., Swin [3]), push forward the state-of-the-art on visual recognition.
Researcher Affiliation Academia Wenguan Wang1 , Cheng Han2 , Tianfei Zhou3 & Dongfang Liu2 CCAI, Zhejiang University1, Rochester Institute of Technology2 & ETH Zurich3
Pseudocode Yes A PSEUDO CODE OF DNC AND CODE RELEASE The pseudo-code of DNC is given in Algorithm 1. Algorithm 1 Pseudo-code of DNC in a Py Torch-like style.
Open Source Code Yes To guarantee reproducibility, our code is available at https://github.com/Cheng Han111/DNC.
Open Datasets Yes The evaluation for image classification is carried out on CIFAR-10[33] and Image Net[25]. The evaluation for semantic segmentation is carried out on ADE20K[37] and Cityscapes[26]. For through evaluation, we conduct extra experiments on COCO-Stuff [104], a famous semantic segmentation dataset.
Dataset Splits Yes For CIFAR-10, we train Res Net for 200 epochs, with batch size 128. For Image Net, we train 100 and 300 epochs with batch size 16 for Res Net and Swin, respectively. We train FCN and Deep Lab V3 with Res Net101 using SGD optimizer with an initial learning rate 0.1, and Uper Net with Swin-B using Adam W with an initial learning rate 6e-5. As common practices [102, 103], wetrainthemodelson ADE20Ktrainwithcropsize512 512andbatchsize16; on Cityscapes train with crop size 769 769 and batch size 8. All the models are trained for 160K iterations on both datasets.
Hardware Specification Yes Models are trained from scratch on eight V100 GPUs.
Software Dependencies No The paper mentions using "mmclassification" and "mmsegmentation" as codebases, and provides GitHub links to them. It also indicates a "Py Torch-like style" for the pseudocode. However, it does not specify exact version numbers for PyTorch, mmclassification, or mmsegmentation, which are necessary for reproducible software dependencies.
Experiment Setup Yes For CIFAR-10, we train Res Net for 200 epochs, with batch size 128. For Image Net, we train 100 and 300 epochs with batch size 16 for Res Net and Swin, respectively. The initial learning rates of Res Net and Swin are set as 0.1 and 0.0005, scheduled by a step policy and polynomial annealing policy, respectively. Other hyperparameters are empirically set as: K=4 and µ=0.999. We train FCN and Deep Lab V3 with Res Net101 using SGD optimizer with an initial learning rate 0.1, and Uper Net with Swin-B using Adam W with an initial learning rate 6e-5. For all the models, the learning rate is scheduled following a polynomial annealing policy. As common practices [102, 103], wetrainthemodelson ADE20Ktrainwithcropsize512 512andbatchsize16; on Cityscapes train with crop size 769 769 and batch size 8. All the models are trained for 160K iterations on both datasets. The hyper-parameters of DNC are by default set as: K =10 and µ=0.999.