reproducibilityindex.ai

Auto Learning Attention

Authors: Benteng Ma, Jing Zhang, Yong Xia, Dacheng Tao

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the obtained HOGA generalizes well on various backbones and outperforms previous hand-crafted attentions for many vision tasks, including image classiﬁcation on the CIFAR100 and Image Net datasets, object detection, and human keypoint detection on the COCO dataset. Code is available at https://github.com/btma48/Auto LA.
Researcher Affiliation	Academia	1Northwestern Polytechnical University, China 2The University of Sydney, Australia 3Research & Development Institute of Northwestern Polytechnical University, Shenzhen
Pseudocode	No	The paper describes algorithmic approaches and uses mathematical equations, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/btma48/Auto LA.
Open Datasets	Yes	Four benchmark datasets, including CIFAR10 [38], CIFAR100 [38], Image Net ILSVRC2012 [39], and COCO [40], are used for this study.
Dataset Splits	Yes	Given a speciﬁc dataset d, which is split into a training partition dtrain and a validation partition dval, the searching algorithm estimates the model hα,θ Hα... We randomly split the training set of CIFAR10 into two parts evenly, one for tuning network parameters (denoting train A) and the other one for tuning the attention architecture (denoting train B).
Hardware Specification	No	The paper mentions that the search is conducted "within 1 GPU day on a modern GPU" but does not specify the model or manufacturer of the GPU, or any other hardware components.
Software Dependencies	No	The paper mentions software components like "SGD optimizer" and frameworks related to "CNN architectures" and "DARTS", but it does not provide specific version numbers for any of them (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup	Yes	In the training stage, we set the order of HOGA to K = 4 to achieve a trade-off between accuracy and complexity. We randomly split the training set of CIFAR10 into two parts evenly... The architecture search procedure is conducted for a total of 100 epochs with a batch size of 128. When training network weights ω, we adopt the SGD optimizer with a momentum 0.9 and a weight decay 0.0003, and the cosine learning rate policy that decays from 0.025 to 0.001 [43]. The initial value of α before softmax is sampled from a standard Gaussian and times 0.001. In the evaluation stage, the standard test set is used. In the evaluation stage on CIFAR10, the entire training set is used, and the network is trained from scratch for 500 epochs with a batch size of 256. ... When testing on CIFAR100 and Image Net, the base channel number of the network is set to 64.