Active Token Mixer

Authors: Guoqiang Wei, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that ATMNet is generally applicable and comprehensively surpasses different families of SOTA vision backbones by a clear margin on a broad range of vision tasks, including visual recognition and dense prediction tasks.
Researcher Affiliation Collaboration Guoqiang Wei1*, Zhizheng Zhang2 , Cuiling Lan2, Yan Lu2, Zhibo Chen1 1University of Science and Technology of China 2Microsoft Research Asia wgq7441@mail.ustc.edu.cn,{zhizzhang, culan, yanlu}@microsoft.com, chenzhibo@ustc.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/microsoft/Active MLP.
Open Datasets Yes We train our models on the Image Net-1K dataset (Deng et al. 2009) from scratch. We evaluate the potential of ATMNet on the challenging semantic segmentation task on ADE20K (Zhou et al. 2019). We further evaluate the performance of our ATMNet on object detection task on the COCO (Lin et al. 2014) dataset.
Dataset Splits Yes We train our models on the Image Net-1K dataset (Deng et al. 2009) from scratch. All models are trained with input size of 224 224 for 300 epochs with the batch size of 1024. The ATMNet-L is finetuned with input size of 384 384 for 30 epochs. The results on top of Uper Net and Semantic FPN are shown in Table 2. For different model scales, ATMNet outperforms all previous methods with comparable computation costs. The largest ATMNet-L with Sematic FPN outperforms previous state-of-the-art Twins-L by +1.4 m Io U with -23% parameters and -16% FLOPs. ATMNet L also achieves the new state-of-the-art (51.1 ms m Io U) with Uper Net, which surpasses the representative network Swin B by +1.4 m Io U with -10% parameters.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup Yes All models are trained with input size of 224 224 for 300 epochs with the batch size of 1024. The ATMNet-L is finetuned with input size of 384 384 for 30 epochs.