Towards Stable and Robust AdderNets
Authors: Minjing Dong, Yunhe Wang, Xinghao Chen, Chang Xu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments conducted on several benchmarks demonstrate the superiority of the proposed approach for generating Adder Nets with higher performance. |
| Researcher Affiliation | Collaboration | Minjing Dong1,2, Yunhe Wang2 , Xinghao Chen2, Chang Xu1 1School of Computer Science, University of Sydney 2Huawei Noah s Ark Lab |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | CIFAR-10 dataset contains 50K training images and 10K validation images with size of 32 32 over 10 classes. We use SGD optimizer with an initial learning rate of 0.1, momentum of 0.9 and a weight decay of 5 10 4. The model is trained on single V100, which takes 400 epochs with a batch size of 256 and a cosine learning rate schedule. [...] PASCAL VOC (VOC) dataset. VOC contains 20 object classes, the training set includes 10K images which are the union of VOC 2007 and VOC 2012, and the VOC 2007 test set with 4.9K images is used for evaluation. The m AP scores using Iou at 0.5 are reported. All the models are trained with the same setting. |
| Dataset Splits | Yes | CIFAR-10 dataset contains 50K training images and 10K validation images with size of 32 32 over 10 classes. |
| Hardware Specification | Yes | The model is trained on single V100 and All the models are trained on 4 V100 GPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in the experiments. |
| Experiment Setup | Yes | We use SGD optimizer with an initial learning rate of 0.1, momentum of 0.9 and a weight decay of 5 10 4. The model is trained on single V100, which takes 400 epochs with a batch size of 256 and a cosine learning rate schedule. The learning rate of trainable parameter ν and υ in AWN is rescaled by a hyper-parameter which we set to be 1 10 5. |