Omni-Dimensional Dynamic Convolution
Authors: Chao Li, Aojun Zhou, Anbang Yao
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the Image Net and MS-COCO datasets show that ODConv brings solid accuracy boosts for various prevailing CNN backbones including both light-weight and large ones, e.g., 3.77% 5.71%|1.86% 3.72% absolute top-1 improvements to Mobivle Net V2|Res Net family on the Image Net dataset. Furthermore, ODConv is also superior to other attention modules for modulating the output features or the convolutional weights. |
| Researcher Affiliation | Collaboration | Chao Li1 , Aojun Zhou2, Anbang Yao1 1Intel Labs China, 2CUHK-Sense Time Joint Lab, The Chinese University of Hong Kong chao.li3@intel.com, aojun.zhou@gmail.com, anbang.yao@intel.com |
| Pseudocode | No | The paper describes the formulation and implementation of ODConv, including mathematical equations and architectural diagrams (e.g., Fig. 1 and Fig. 2), but it does not include formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and models will be available at https://github.com/OSVAI/ODConv. |
| Open Datasets | Yes | Extensive experiments on the Image Net and MS-COCO datasets show that ODConv brings solid accuracy boosts for various prevailing CNN backbones... Our main experiments are performed on the Image Net dataset (Russakovsky et al., 2015). It has over 1.2 million images for training and 50,000 images for validation, including 1,000 object classes. ... MS-COCO dataset (Lin et al., 2014). The 2017 version of MS-COCO dataset contains 118,000 training images and 5,000 validation images with 80 object classes. |
| Dataset Splits | Yes | Our main experiments are performed on the Image Net dataset (Russakovsky et al., 2015). It has over 1.2 million images for training and 50,000 images for validation, including 1,000 object classes. ... The 2017 version of MS-COCO dataset contains 118,000 training images and 5,000 validation images with 80 object classes. ... For training, images are resized to 256 × 256 first, and then 224 × 224 crops are randomly sampled from the resized images or their horizontal flips normalized with the per-channel mean and standard deviation values. For evaluation, we report top-1 and top-5 recognition rates using the center image crops. |
| Hardware Specification | Yes | All experiments are performed on the servers having 8 GPUs. ... All models are trained on the Image Net dataset using the server with 8 NVIDIA TITAN X GPUs. We report results in terms of three metrics (seconds per batch, minutes per epoch, and the total number of hours for the whole training). ... All pre-trained models are tested on an NVIDIA TITAN X GPU (with batch size 200) and a single core of Intel E5-2683 v3 CPU (with batch size 1) separately, and the input image size is 224 × 224 pixels. |
| Software Dependencies | No | The paper mentions using the 'MMDetection toolbox (Chen et al., 2019)' but does not specify its version number or any other software dependencies with their versions (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For Res Net18, Res Net50 and Res Net101, all models are trained with SGD for 100 epochs. We set the batch size as 256, the weight decay as 0.0001 and the momentum as 0.9. The learning rate starts at 0.1, and is divided by 10 every 30 epochs. ... For Mobile Net V2 (1.0 , 0.75 , 0.5 ), all models are trained with SGD for 150 epochs ... We set the batch size as 256, the weight decay as 0.00004 and the momentum as 0.9. The learning rate starts at 0.05, and is scheduled to arrive at zero within a single cosine cycle. ... Regarding the temperature annealing strategy used for Dy Conv and ODConv, the temperature reduces from 30 to 1 linearly in the first 10 epochs for all models. |