reproducibilityindex.ai

Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences

Authors: Zicheng Liu, Siyuan Li, Li Wang, Zedong Wang, Yunfan Liu, Stan Z. Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments To assess CHELA, we performed tests on five standard sequence modeling tasks involving different types of data, and we compared them with the latest cutting-edge models for each task. All experiments were realized based on NVIDIA A100-80G and Pytorch.
Researcher Affiliation	Academia	1AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China. Correspondence to: Stan Z. Li <stan.zq.li@westlake.edu.cn>.
Pseudocode	No	The paper describes methods using mathematical equations and block diagrams but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets	Yes	We conducted experiments to evaluate sequence models using the Long Range Arena (LRA) benchmark. This benchmark, introduced by (Tay et al., 2020b)...
Dataset Splits	Yes	For all tasks, we closely follow Tay et al. (2020b) for details such as data preprocessing, data split, etc. Within this dataset, the typical training and testing split is maintained, reserving 10% of the training set for validation purposes.
Hardware Specification	Yes	All experiments were realized based on NVIDIA A100-80G and Pytorch.
Software Dependencies	No	The paper mentions "Pytorch" as a software dependency but does not specify a version number.
Experiment Setup	Yes	The hyper-parameters of CHELA models on these tasks are listed in Table 7. Other training hyperparameters, including optimizer, learning rate scheduler, and architecture, are presented in Table 8.