CAKES: Channel-wise Automatic KErnel Shrinking for Efficient 3D Networks

Authors: Qihang Yu, Yingwei Li, Jieru Mei, Yuyin Zhou, Alan Yuille3225-3233

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental CAKES shows superior performance to other methods with similar model size, and it also achieves comparable performance to state-of-the-art with much fewer parameters and computational costs on tasks including 3D medical imaging segmentation and video action recognition.
Researcher Affiliation Academia Qihang Yu, Yingwei Li, Jieru Mei, Yuyin Zhou, Alan Yuille Johns Hopkins University {yucornetto, meijieru, zhouyuyiner, alan.l.yuille}@gmail.com, yingwei.li@jhu.edu
Pseudocode No The paper describes algorithmic concepts and steps (e.g., "formulating kernel shrinkage as a path-level selection problem"), but it does not present any formal pseudocode or algorithm blocks with numbered steps or code-like formatting.
Open Source Code Yes Codes and models are available at https: //github.com/yucornetto/CAKES.
Open Datasets Yes We evaluate the proposed method on two public datasets: 1) Pancreas Tumours dataset from the Medical Segmentation Decathlon Challenge (MSD) (Simpson et al. 2019)... and 2) NIH Pancreas Segmentation dataset (Roth et al. 2015)...
Dataset Splits Yes For the MSD dataset, we use 226 cases for training and evaluate the segmentation performance on the rest 56 cases. [...] We test the model in a 4-fold cross-validation manner following previous methods (Zhou et al. 2017, 2019b).
Hardware Specification No The paper mentions '4 GPUs' and '8 GPUs' but does not provide any specific model numbers, manufacturers, or other detailed specifications (e.g., NVIDIA A100, CPU type, memory size) for the hardware used in experiments.
Software Dependencies No The paper mentions the use of an 'SGD optimizer' and references 'C2FNAS (Yu et al. 2020b)' as a backbone, but it does not provide specific version numbers for any software frameworks (e.g., PyTorch, TensorFlow), libraries, or programming languages (e.g., Python 3.x).
Experiment Setup Yes For the MSD dataset, we use random crop with patch size of 96 96 96, random rotation (0 , 90 , 180 , and 270 ) and flip in all three axes as data augmentation. The batch size is 8 with 4 GPUs. We use SGD optimizer with learning rate starting from 0.01 with polynomial decay of power of 0.9, momentum of 0.9, and weight decay of 0.00004. The training lasts for 40k iters.