reproducibilityindex.ai

Bridging the Divide: Reconsidering Softmax and Linear Attention

Authors: Dongchen Han, Yifan Pu, Zhuofan Xia, Yizeng Han, Xuran Pan, Xiu Li, Jiwen Lu, Shiji Song, Gao Huang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct empirical verification to fully validate the importance of these two properties and the effectiveness of our methods.
Researcher Affiliation	Academia	Tsinghua University
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/Leap Lab THU/In Line.
Open Datasets	Yes	Image Net-1K [5] recognition dataset contains 1.28M training images and 50K validation images with a total of 1,000 classes. ... COCO [18] object detection and instance segmentation dataset ... ADE20K [44] is a well-established benchmark for semantic segmentation
Dataset Splits	Yes	Image Net-1K [5] recognition dataset contains 1.28M training images and 50K validation images with a total of 1,000 classes. ... COCO [18] object detection and instance segmentation dataset has 118K training and 5K validation images. ... ADE20K [44] is a well-established benchmark for semantic segmentation which encompasses 20K training images, 2K validation images and 150 semantic categories.
Hardware Specification	Yes	Runtime and FPS is tested on a RTX3090 GPU.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	We use Adam W [21] optimizer to train all our models from scratch for 300 epochs, employing cosine learning rate decay with 20 epochs of linear warm-up. The initial learning rate is 1 10 3, and the weight decay is 0.05. Augmentation and regularization strategies consist of Rand Augment [4], Mixup [42], Cut Mix [41], and random erasing [43].