reproducibilityindex.ai

Sparse Modular Activation for Efficient Sequence Modeling

Authors: Liliang Ren, Yang Liu, Shuohang Wang, Yichong Xu, Chenguang Zhu, Cheng Xiang Zhai

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct comprehensive experiments to show that Seq Boat has signiﬁcantly better qualityefﬁciency trade-off than state-of-the-art hybrid models on a wide range of tasks, including Long Range Arena (LRA) [TDA+20], speech classiﬁcation [War18] and language modeling [Hut06].
Researcher Affiliation	Collaboration	Liliang Ren1 Yang Liu2 Shuohang Wang2 Yichong Xu Chenguang Zhu2 Chengxiang Zhai1 1University of Illinois at Urbana-Champaign 2Microsoft
Pseudocode	Yes	The Pytorch-like [PGM+19] code snippets of the compress and extract operators are provided in the Appendix A.1 with an efﬁcient support of batched sequences using the scatter operation. Listing 1: Pytorch-like code snippet for the Compress operator. Listing 2: Pytorch-like code snippet for the Extract operator.
Open Source Code	Yes	Our code is publicly available at https://github.com/renll/Seq Boat.
Open Datasets	Yes	We conduct comprehensive experiments to show that Seq Boat has signiﬁcantly better qualityefﬁciency trade-off than state-of-the-art hybrid models on a wide range of tasks, including Long Range Arena (LRA) [TDA+20], speech classiﬁcation [War18] and language modeling [Hut06].
Dataset Splits	Yes	We conduct comprehensive experiments to show that Seq Boat has signiﬁcantly better qualityefﬁciency trade-off than state-of-the-art hybrid models on a wide range of tasks, including Long Range Arena (LRA) [TDA+20], speech classiﬁcation [War18] and language modeling [Hut06]. We measure the mean and the standard deviation (plotted as error bars) of the activation time on 100 sequences randomly sampled from the validation set of each task.
Hardware Specification	Yes	All the experiments are conducted on a mixed cluster with 8 NVIDIA V100 32GB GPUs and 2 NVIDIA A5000 24GB GPUs.
Software Dependencies	No	The Pytorch-like [PGM+19] code snippets of the compress and extract operators are provided in the Appendix A.1 with an efﬁcient support of batched sequences using the scatter operation. For Long Range Arena (LRA) and Speech Command tasks, we use the Adam W [LH18] optimizer. For language modeling tasks, we use the RAdam [LJH+19] optimizer. No specific version numbers for PyTorch or the optimizers are provided.
Experiment Setup	Yes	Table 4: Hyper-parameter Settings of our Seq Boat model for the LRA benchmark and the Speech Command (SC) dataset. DP is the dropout rate, BSZ is batch size, LR is learning rate, WD is weight decay, and Pre-N is Pre-normalization. Table 5: Hyper-parameters of our Seq Boat model for language modeling.