Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module

Authors: Yixing Xu, Chao Li, Dong Li, Xiao Sheng, Fan Jiang, Lu Tian, Ashish Sirasao, Emad Barsoum

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that we can apply our method to a wide range of state-of-the-art vision transformer models irrespective of how they modify their selfattention part and the overall architecture, and reduce FLOPs and parameters without compromising classification accuracy on the Image Net dataset.
Researcher Affiliation Industry 1Advanced Micro Devices, Inc., Beijing, China. Correspondence to: Yixing Xu <yixing.xu@amd.com>.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link regarding open-source code for the described methodology.
Open Datasets Yes We conduct experiments on the Image Net-1k dataset for image classification and then ablate different parts of IFFN through ablation studies. Experiments on object detection and semantic segmentation are shown in the Appendix B and C. ... object detection on the COCO 2017 dataset, ... semantic segmentation task on the ADE20K dataset
Dataset Splits Yes We empirically verify the effectiveness of the proposed IFFN module on the Image Net-1k dataset which contains 1.28M training images from 1000 different classes and 50K validation images. ... object detection on the COCO 2017 dataset, which contains 118K training images, 5K validation images and 20K test-dev images. ... semantic segmentation task on the ADE20K dataset, which contains 20K training images, 2K validation images and 3K test images from 150 different semantic categories.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1').
Experiment Setup No The training strategies are exactly the same as the original methods.