KernelWarehouse: Rethinking the Design of Dynamic Convolution

Authors: Chao Li, Anbang Yao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We testify the effectiveness of Kernel Warehouse on Image Net and MS-COCO datasets using various Conv Net architectures. We validate the effectiveness of Kernel Warehouse through extensive experiments on Image Net and MS-COCO datasets.
Researcher Affiliation Industry 1Intel Labs China. Correspondence to: Anbang Yao <anbang.yao@intel.com>.
Pseudocode Yes Algorithm 1 shows the implementation of Kernel Warehouse, given a Conv Net backbone and the desired convolutional parameter budget b.
Open Source Code Yes The code and models are available at https://github. com/OSVAI/Kernel Warehouse.
Open Datasets Yes In this section, we conduct comprehensive experiments on Image Net dataset (Russakovsky et al., 2015) and MS-COCO dataset (Lin et al., 2014) to evaluate the effectiveness of our proposed Kernel Warehouse ( KW for short, in Tables).
Dataset Splits Yes Image Net dataset (Russakovsky et al., 2015), which consists of over 1.2 million training images and 50,000 validation images with 1,000 object categories. MS-COCO dataset (Lin et al., 2014), which contains 118,000 training images and 5,000 validation images with 80 object categories.
Hardware Specification Yes Specifically, the models of Res Net18, Mobile Net V2 (1.0 ), Mobile Net V2 (0.5 ) are trained on the servers with 8 NVIDIA Titan X GPUs. The models of Res Net50, Conv Ne Xt-Tiny are trained on the servers with 8 NVIDIA Tesla V100-SXM3 or A100 GPUs.
Software Dependencies No The paper mentions optimizers (SGD, AdamW) and augmentation techniques (Randaugment, mixup, cutmix, random erasing, label smoothing) but does not provide specific version numbers for any software libraries or frameworks like PyTorch or TensorFlow.
Experiment Setup Yes All the models are trained by the stochastic gradient descent (SGD) optimizer for 100 epochs, with a batch size of 256, a momentum of 0.9 and a weight decay of 0.0001. The initial learning rate is set to 0.1 and decayed by a factor of 10 for every 30 epoch. Horizontal flipping and random resized cropping are used for data augmentation.