Towards Stable and Robust AdderNets

Authors: Minjing Dong, Yunhe Wang, Xinghao Chen, Chang Xu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments conducted on several benchmarks demonstrate the superiority of the proposed approach for generating Adder Nets with higher performance.
Researcher Affiliation Collaboration Minjing Dong1,2, Yunhe Wang2 , Xinghao Chen2, Chang Xu1 1School of Computer Science, University of Sydney 2Huawei Noah s Ark Lab
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes CIFAR-10 dataset contains 50K training images and 10K validation images with size of 32 32 over 10 classes. We use SGD optimizer with an initial learning rate of 0.1, momentum of 0.9 and a weight decay of 5 10 4. The model is trained on single V100, which takes 400 epochs with a batch size of 256 and a cosine learning rate schedule. [...] PASCAL VOC (VOC) dataset. VOC contains 20 object classes, the training set includes 10K images which are the union of VOC 2007 and VOC 2012, and the VOC 2007 test set with 4.9K images is used for evaluation. The m AP scores using Iou at 0.5 are reported. All the models are trained with the same setting.
Dataset Splits Yes CIFAR-10 dataset contains 50K training images and 10K validation images with size of 32 32 over 10 classes.
Hardware Specification Yes The model is trained on single V100 and All the models are trained on 4 V100 GPUs.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies used in the experiments.
Experiment Setup Yes We use SGD optimizer with an initial learning rate of 0.1, momentum of 0.9 and a weight decay of 5 10 4. The model is trained on single V100, which takes 400 epochs with a batch size of 256 and a cosine learning rate schedule. The learning rate of trainable parameter ν and υ in AWN is rescaled by a hyper-parameter which we set to be 1 10 5.