Auto Learning Attention
Authors: Benteng Ma, Jing Zhang, Yong Xia, Dacheng Tao
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the obtained HOGA generalizes well on various backbones and outperforms previous hand-crafted attentions for many vision tasks, including image classification on the CIFAR100 and Image Net datasets, object detection, and human keypoint detection on the COCO dataset. Code is available at https://github.com/btma48/Auto LA. |
| Researcher Affiliation | Academia | 1Northwestern Polytechnical University, China 2The University of Sydney, Australia 3Research & Development Institute of Northwestern Polytechnical University, Shenzhen |
| Pseudocode | No | The paper describes algorithmic approaches and uses mathematical equations, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/btma48/Auto LA. |
| Open Datasets | Yes | Four benchmark datasets, including CIFAR10 [38], CIFAR100 [38], Image Net ILSVRC2012 [39], and COCO [40], are used for this study. |
| Dataset Splits | Yes | Given a specific dataset d, which is split into a training partition dtrain and a validation partition dval, the searching algorithm estimates the model hα,θ Hα... We randomly split the training set of CIFAR10 into two parts evenly, one for tuning network parameters (denoting train A) and the other one for tuning the attention architecture (denoting train B). |
| Hardware Specification | No | The paper mentions that the search is conducted "within 1 GPU day on a modern GPU" but does not specify the model or manufacturer of the GPU, or any other hardware components. |
| Software Dependencies | No | The paper mentions software components like "SGD optimizer" and frameworks related to "CNN architectures" and "DARTS", but it does not provide specific version numbers for any of them (e.g., PyTorch 1.9, Python 3.8). |
| Experiment Setup | Yes | In the training stage, we set the order of HOGA to K = 4 to achieve a trade-off between accuracy and complexity. We randomly split the training set of CIFAR10 into two parts evenly... The architecture search procedure is conducted for a total of 100 epochs with a batch size of 128. When training network weights ω, we adopt the SGD optimizer with a momentum 0.9 and a weight decay 0.0003, and the cosine learning rate policy that decays from 0.025 to 0.001 [43]. The initial value of α before softmax is sampled from a standard Gaussian and times 0.001. In the evaluation stage, the standard test set is used. In the evaluation stage on CIFAR10, the entire training set is used, and the network is trained from scratch for 500 epochs with a batch size of 256. ... When testing on CIFAR100 and Image Net, the base channel number of the network is set to 64. |