Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
EcoFormer: Energy-Saving Attention with Linear Complexity
Authors: Jing Liu, Zizheng Pan, Haoyu He, Jianfei Cai, Bohan Zhuang
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on both vision and language tasks show that Eco Former consistently achieves comparable performance with standard attentions while consuming much fewer resources. |
| Researcher Affiliation | Academia | Department of Data Science & AI, Monash University, Australia |
| Pseudocode | No | The paper describes the proposed method in prose and through diagrams, but it does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/ziplab/EcoFormer. |
| Open Datasets | Yes | To investigate the effectiveness of the proposed method, we conduct experiments on Image Net-1K [30], a large-scale image classification dataset that contains 1.2M training images from 1K categories and 50K validation images. |
| Dataset Splits | Yes | Image Net-1K [30], a large-scale image classification dataset that contains 1.2M training images from 1K categories and 50K validation images. |
| Hardware Specification | Yes | All models in this experiment are trained on 8 V100 GPUs with a total batch size of 256. ... Moreover, we report the on-chip energy consumption according to Table 1 and the throughput with a mini-batch size of 32 on a single NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions using the Adam W optimizer and states implementations are based on released code from other papers, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | All training images are resized to 256 × 256, and 224 × 224 patches are randomly cropped from an image or its horizontal flip, with the per-pixel mean subtracted. ... Next, we finetune each model on Image Net-1K with 100 epochs. ... All models in this experiment are trained on 8 V100 GPUs with a total batch size of 256. We set the initial learning rate to 2.5 × 10−5 for PVTv2 and 1.25 × 10−4 for Twins. We use Adam W optimizer [40] with a cosine decay learning rate scheduler. |