Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences

Authors: Zicheng Liu, Siyuan Li, Li Wang, Zedong Wang, Yunfan Liu, Stan Z. Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments To assess CHELA, we performed tests on five standard sequence modeling tasks involving different types of data, and we compared them with the latest cutting-edge models for each task. All experiments were realized based on NVIDIA A100-80G and Pytorch.
Researcher Affiliation Academia 1AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China. Correspondence to: Stan Z. Li <stan.zq.li@westlake.edu.cn>.
Pseudocode No The paper describes methods using mathematical equations and block diagrams but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets Yes We conducted experiments to evaluate sequence models using the Long Range Arena (LRA) benchmark. This benchmark, introduced by (Tay et al., 2020b)...
Dataset Splits Yes For all tasks, we closely follow Tay et al. (2020b) for details such as data preprocessing, data split, etc. Within this dataset, the typical training and testing split is maintained, reserving 10% of the training set for validation purposes.
Hardware Specification Yes All experiments were realized based on NVIDIA A100-80G and Pytorch.
Software Dependencies No The paper mentions "Pytorch" as a software dependency but does not specify a version number.
Experiment Setup Yes The hyper-parameters of CHELA models on these tasks are listed in Table 7. Other training hyperparameters, including optimizer, learning rate scheduler, and architecture, are presented in Table 8.