Strengthening Layer Interaction via Dynamic Layer Attention

Authors: Kaishen Wang, Xun Xia, Jian Liu, Zhang Yi, Tao He

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate the effectiveness of the proposed DLA architecture, outperforming other state-of-the-art methods in image recognition and object detection tasks.
Researcher Affiliation Academia Kaishen Wang1 , Xun Xia 2 , Jian Liu2 , Zhang Yi1 , Tao He1, 1College of Computer Science, Sichuan University 2Clinical Medical College and The First Affiliated Hospital of Chengdu Medical College wangks@stu.scu.edu.cn, xiaxun@cmc.edu.cn, liujiansh@126.com, {zhangyi, tao he}@scu.edu.cn
Pseudocode No The paper describes the workflow of the DSU block and provides architectural diagrams (Figure 3), but it does not include formal pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes The code is available at https://github.com/tunantu/ Dynamic-Layer-Attention.
Open Datasets Yes We conducted experiments on the CIFAR-10, CIFAR-100, and Image Net-1K datasets using Res Nets [He et al., 2016] as the backbone network for image classification. In the context of object detection, our approach was evaluated on the COCO2017 dataset using the Faster R-CNN [Ren et al., 2015] and Mask R-CNN [He et al., 2017] frameworks as detectors.
Dataset Splits Yes For the CIFAR-10 and CIFAR-100 datasets, we employed standard data augmentation strategies [Huang et al., 2016]. The training process involved random horizontal flipping of images, padding each side by 4 pixels, and then randomly cropping to 32 x 32. For the Image Net-1K dataset, we adopted the same data augmentation strategy and hyperparameter settings outlined in [He et al., 2016] and [He et al., 2019]. During training, images were randomly cropped to 224 x 224 with horizontal flipping. In the testing phase, images were resized to 256 x 256, then centrally cropped to a final size of 224 x 224. The object detection results on the COCO val2017 with Faster R-CNN and Mask R-CNN.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or detailed computer specifications used for running its experiments.
Software Dependencies No The paper mentions software components like 'SGD optimizer', 'Faster R-CNN', 'Mask R-CNN', and 'MMDetection toolkit', but it does not provide specific version numbers for any of them.
Experiment Setup Yes For the CIFAR-10 and CIFAR-100 datasets, training hyperparameters such as batch size, initial learning rate, and weight decay followed the recommendations of the original Res Nets [He et al., 2016]. For the Image Net-1K dataset, The optimization process utilized an SGD optimizer with a momentum of 0.9 and weight decay of 1e-4. The initial learning rate was set to 0.1 and decreased according to the Multi Step LR schedule over 100 epochs for a batch size of 256. In the context of object detection, The optimization process employed SGD with a weight decay of 1e-4, momentum of 0.9, and a batch size of 8. The models underwent training for a total of 12 epochs, starting with an initial learning rate of 0.01. Learning rate adjustments occurred at the 8th and 11th epochs, with a reduction by a factor of 10 each time.