Handling Long-tailed Feature Distribution in AdderNets

Authors: Minjing Dong, Yunhe Wang, Xinghao Chen, Chang Xu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments conducted on several benchmarks and comparison with other distributions demonstrate the effectiveness of proposed approach for boosting the performance of ANNs.
Researcher Affiliation Collaboration Minjing Dong1,2, Yunhe Wang2 , Xinghao Chen2, Chang Xu1 1School of Computer Science, University of Sydney 2Huawei Noah s Ark Lab mdon0736@uni.sydney.edu.au, yunhe.wang@huawei.com, xinghao.chen@huawei.com, c.xu@sydney.edu.au
Pseudocode Yes Algorithm 1 Skew Laplace Mixture Loss with Angle-based Constraint for Adder Net
Open Source Code No The paper does not contain any explicit statement about providing open-source code or a link to a code repository.
Open Datasets Yes We conduct empirical evaluation of the proposed SLAC ANN on several image classification benchmarks, including CIFAR-10, CIFAR-100 and Image Net.
Dataset Splits Yes CIFAR-10 and CIFAR-100 dataset contain 50K training images and 10K validation images with size of 32 32 from 10 categories. Image Net dataset [12], which contains 1.2M training images and 50k testing images with size of 224 224 from 1000 categories.
Hardware Specification Yes The models are trained on 4 NVIDIA Tesla V100 GPUs.
Software Dependencies No The paper mentions using 'SGD optimizer' but does not specify any software libraries or their version numbers (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes We make use of SGD optimizer with an initial learning rate of 0.1, weight decay of 5 10 4, momentum of 0.9 and a cosine learning rate schedule. The entire training takes 800 epochs with a batch size of 256. The learning rate of trainable parameter Σ is downscale by 1 102, λ is set to 0.01 and β to 0.1.