Handling Long-tailed Feature Distribution in AdderNets
Authors: Minjing Dong, Yunhe Wang, Xinghao Chen, Chang Xu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments conducted on several benchmarks and comparison with other distributions demonstrate the effectiveness of proposed approach for boosting the performance of ANNs. |
| Researcher Affiliation | Collaboration | Minjing Dong1,2, Yunhe Wang2 , Xinghao Chen2, Chang Xu1 1School of Computer Science, University of Sydney 2Huawei Noah s Ark Lab mdon0736@uni.sydney.edu.au, yunhe.wang@huawei.com, xinghao.chen@huawei.com, c.xu@sydney.edu.au |
| Pseudocode | Yes | Algorithm 1 Skew Laplace Mixture Loss with Angle-based Constraint for Adder Net |
| Open Source Code | No | The paper does not contain any explicit statement about providing open-source code or a link to a code repository. |
| Open Datasets | Yes | We conduct empirical evaluation of the proposed SLAC ANN on several image classification benchmarks, including CIFAR-10, CIFAR-100 and Image Net. |
| Dataset Splits | Yes | CIFAR-10 and CIFAR-100 dataset contain 50K training images and 10K validation images with size of 32 32 from 10 categories. Image Net dataset [12], which contains 1.2M training images and 50k testing images with size of 224 224 from 1000 categories. |
| Hardware Specification | Yes | The models are trained on 4 NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions using 'SGD optimizer' but does not specify any software libraries or their version numbers (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We make use of SGD optimizer with an initial learning rate of 0.1, weight decay of 5 10 4, momentum of 0.9 and a cosine learning rate schedule. The entire training takes 800 epochs with a batch size of 256. The learning rate of trainable parameter Σ is downscale by 1 102, λ is set to 0.01 and β to 0.1. |