Visual Recognition with Deep Nearest Centroids
Authors: Wenguan Wang, Cheng Han, Tianfei Zhou, Dongfang Liu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Compared with parametric counterparts, DNC performs better on image classification (CIFAR-10, CIFAR-100, Image Net) and greatly boosts pixel recognition (ADE20K, Cityscapes) with improved transparency, using various backbone network architectures (Res Net, Swin) and segmentation models (FCN, Deep Lab V3, Swin). Our code is available at DNC. 1 INTRODUCTION Deep learning models, from convolutional networks (e.g., VGG [1], Res Net [2]) to Transformer-based architectures (e.g., Swin [3]), push forward the state-of-the-art on visual recognition. |
| Researcher Affiliation | Academia | Wenguan Wang1 , Cheng Han2 , Tianfei Zhou3 & Dongfang Liu2 CCAI, Zhejiang University1, Rochester Institute of Technology2 & ETH Zurich3 |
| Pseudocode | Yes | A PSEUDO CODE OF DNC AND CODE RELEASE The pseudo-code of DNC is given in Algorithm 1. Algorithm 1 Pseudo-code of DNC in a Py Torch-like style. |
| Open Source Code | Yes | To guarantee reproducibility, our code is available at https://github.com/Cheng Han111/DNC. |
| Open Datasets | Yes | The evaluation for image classification is carried out on CIFAR-10[33] and Image Net[25]. The evaluation for semantic segmentation is carried out on ADE20K[37] and Cityscapes[26]. For through evaluation, we conduct extra experiments on COCO-Stuff [104], a famous semantic segmentation dataset. |
| Dataset Splits | Yes | For CIFAR-10, we train Res Net for 200 epochs, with batch size 128. For Image Net, we train 100 and 300 epochs with batch size 16 for Res Net and Swin, respectively. We train FCN and Deep Lab V3 with Res Net101 using SGD optimizer with an initial learning rate 0.1, and Uper Net with Swin-B using Adam W with an initial learning rate 6e-5. As common practices [102, 103], wetrainthemodelson ADE20Ktrainwithcropsize512 512andbatchsize16; on Cityscapes train with crop size 769 769 and batch size 8. All the models are trained for 160K iterations on both datasets. |
| Hardware Specification | Yes | Models are trained from scratch on eight V100 GPUs. |
| Software Dependencies | No | The paper mentions using "mmclassification" and "mmsegmentation" as codebases, and provides GitHub links to them. It also indicates a "Py Torch-like style" for the pseudocode. However, it does not specify exact version numbers for PyTorch, mmclassification, or mmsegmentation, which are necessary for reproducible software dependencies. |
| Experiment Setup | Yes | For CIFAR-10, we train Res Net for 200 epochs, with batch size 128. For Image Net, we train 100 and 300 epochs with batch size 16 for Res Net and Swin, respectively. The initial learning rates of Res Net and Swin are set as 0.1 and 0.0005, scheduled by a step policy and polynomial annealing policy, respectively. Other hyperparameters are empirically set as: K=4 and µ=0.999. We train FCN and Deep Lab V3 with Res Net101 using SGD optimizer with an initial learning rate 0.1, and Uper Net with Swin-B using Adam W with an initial learning rate 6e-5. For all the models, the learning rate is scheduled following a polynomial annealing policy. As common practices [102, 103], wetrainthemodelson ADE20Ktrainwithcropsize512 512andbatchsize16; on Cityscapes train with crop size 769 769 and batch size 8. All the models are trained for 160K iterations on both datasets. The hyper-parameters of DNC are by default set as: K =10 and µ=0.999. |