Toeplitz Neural Network for Sequence Modeling

Authors: Zhen Qin, Xiaodong Han, Weixuan Sun, Bowen He, Dong Li, Dongxu Li, Yuchao Dai, Lingpeng Kong, Yiran Zhong

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on autoregressive and bidirectional language modeling, image modeling, and the challenging Long-Range Arena benchmark show that our method achieves better performance than its competitors in most downstream tasks while being significantly faster.
Researcher Affiliation Collaboration 1Shanghai AI Laboratory 2Sense Time Research 3Australian National University 4Northwestern Polytechnical University 5The University of Hong Kong
Pseudocode No The paper does not contain any explicit pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/Open NLPLab/Tnn.
Open Datasets Yes We evaluate our methods on the Wiki Text-103 (Merity et al., 2017) for autoregressive language modeling and the input length extrapolation ability, and the GLUE benchmark (Wang et al., 2018) for bidirectional language modeling. We also validate the accuracy and efficiency of our methods in handling long-range dependencies on the Long-Range Arena benchmark (Tay et al., 2020). To demonstrate the robustness of our model, we implement our model in Dei T (Touvron et al., 2021) structure and compare its performance with the vanilla Dei T (Touvron et al., 2021) on the Image Net1K (Deng et al., 2009) for image classification.
Dataset Splits Yes For the autoregressive language modeling, all models are trained on the Wiki Text-103 dataset (Merity et al., 2017)... We use perplexity (PPL) as the evaluation metric. Table 2: Performances comparison of autoregressive language modeling on the Wikitext103 dataset. The table includes 'PPL (val)' and 'PPL (test)' columns.
Hardware Specification Yes Timing is conducted on an Nvidia A6000 GPU with 48G GPU memory.
Software Dependencies No The paper mentions 'Pytorch' but does not provide a specific version number for it or any other key software dependencies.
Experiment Setup Yes Table 12: Detailed training configurations used in our experiments. The table specifies 'Total batch size', 'Number of updates/epochs', 'Warmup steps/epochs', 'Peak learning rate', 'Learning rate scheduler', 'Optimizer', 'Adam ε', 'Adam (β1, β2)', 'Weight decay', and 'Gradient clipping'.