LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory
Authors: Zicheng Liu, Li Wang, Siyuan Li, Zedong Wang, Haitao Lin, Stan Z. Li
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on the Long Range Arena benchmark, autoregressive language modeling, and image and speech classification demonstrate the effectiveness of Long VQ. |
| Researcher Affiliation | Academia | Zicheng Liu1,2 , Li Wang2 , Siyuan Li1,2 , Zedong Wang2 , Haitao Lin1,2 and Stan Z. Li2 1Zhejiang University, Hangzhou, China 2AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing code or links to a code repository. |
| Open Datasets | Yes | We benchmarked Long VQ on five popular datasets, including Long Range Arena (LRA) [Tay et al., 2020b]... the image dataset s CIFAR [Krizhevsky et al., 2009], the natural language datasets Wiki Text-103 [Merity et al., 2016] and enwik8, and the speech data Speech Command [Warden, 2018]. |
| Dataset Splits | Yes | The CIFAR-10 dataset s standard train and test split is used, and 10% of the training set is withheld as the validation set. |
| Hardware Specification | Yes | All experiments were realized based on NVIDIA A100-80G and Pytorch. |
| Software Dependencies | No | The paper mentions 'Pytorch' but does not specify a version number or other software dependencies with versions. |
| Experiment Setup | Yes | We used float32 parameters, with bfloat16 precision for most computations. We adopt Adam W as the optimizer with a gradient clip of 0.1. The codebook commit coefficient was always γ = 0.0001, and the codebook EMA rate was always = 0.99. All models were trained with a global batch size of 128 sequences. |