KernelWarehouse: Rethinking the Design of Dynamic Convolution
Authors: Chao Li, Anbang Yao
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We testify the effectiveness of Kernel Warehouse on Image Net and MS-COCO datasets using various Conv Net architectures. We validate the effectiveness of Kernel Warehouse through extensive experiments on Image Net and MS-COCO datasets. |
| Researcher Affiliation | Industry | 1Intel Labs China. Correspondence to: Anbang Yao <anbang.yao@intel.com>. |
| Pseudocode | Yes | Algorithm 1 shows the implementation of Kernel Warehouse, given a Conv Net backbone and the desired convolutional parameter budget b. |
| Open Source Code | Yes | The code and models are available at https://github. com/OSVAI/Kernel Warehouse. |
| Open Datasets | Yes | In this section, we conduct comprehensive experiments on Image Net dataset (Russakovsky et al., 2015) and MS-COCO dataset (Lin et al., 2014) to evaluate the effectiveness of our proposed Kernel Warehouse ( KW for short, in Tables). |
| Dataset Splits | Yes | Image Net dataset (Russakovsky et al., 2015), which consists of over 1.2 million training images and 50,000 validation images with 1,000 object categories. MS-COCO dataset (Lin et al., 2014), which contains 118,000 training images and 5,000 validation images with 80 object categories. |
| Hardware Specification | Yes | Specifically, the models of Res Net18, Mobile Net V2 (1.0 ), Mobile Net V2 (0.5 ) are trained on the servers with 8 NVIDIA Titan X GPUs. The models of Res Net50, Conv Ne Xt-Tiny are trained on the servers with 8 NVIDIA Tesla V100-SXM3 or A100 GPUs. |
| Software Dependencies | No | The paper mentions optimizers (SGD, AdamW) and augmentation techniques (Randaugment, mixup, cutmix, random erasing, label smoothing) but does not provide specific version numbers for any software libraries or frameworks like PyTorch or TensorFlow. |
| Experiment Setup | Yes | All the models are trained by the stochastic gradient descent (SGD) optimizer for 100 epochs, with a batch size of 256, a momentum of 0.9 and a weight decay of 0.0001. The initial learning rate is set to 0.1 and decayed by a factor of 10 for every 30 epoch. Horizontal flipping and random resized cropping are used for data augmentation. |