A Tensorized Transformer for Language Modeling
Authors: Xindian Ma, Peng Zhang, Shuai Zhang, Nan Duan, Yuexian Hou, Ming Zhou, Dawei Song
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test and verify the proposed attention method on three language modeling tasks (i.e., PTB, Wiki Text-103 and Onebillion) and a neural machine translation task (i.e., WMT-2016 English-German). Multi-linear attention can not only largely compress the model parameters but also obtain performance improvements, compared with a number of language modeling approaches, such as Transformer, Transformer-XL, and Transformer with tensor train decomposition. |
| Researcher Affiliation | Collaboration | Xindian Ma1, Peng Zhang1 , Shuai Zhang1, Nan Duan2, Yuexian Hou1, Dawei Song3, Ming Zhou2 1College of Intelligence and Computing, Tianjin University, Tianjin, China 2Microsoft Research Asia, Beijing, China 3School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code3 for running experiments has been released, and the key code which is about our method can be found in Supplementary Materials F. . . . 3https://github.com/szhangtju/The-compression-of-Transformer |
| Open Datasets | Yes | We chose three datasets in the order of small (i.e., PTB), medium (i.e., Wiki Text-103) and large (i.e., One-Billion). . . . In this task, we have trained the Transformer model [35] on WMT 2016 English-German dataset [31]. |
| Dataset Splits | Yes | PTB has 929k training tokens, 73k validation words, and 82k test words. |
| Hardware Specification | No | Other details (such as hyperparameters and Hardware) can be found in Supplementary Materials E. The main paper does not explicitly describe specific hardware details. |
| Software Dependencies | No | Other details (such as hyperparameters and Hardware) can be found in Supplementary Materials E. The main paper does not explicitly list specific software dependencies with version numbers. It mentions 'Sentence Piece 4' but this is a tool used, not the general software environment. |
| Experiment Setup | No | Other details (such as hyperparameters and Hardware) can be found in Supplementary Materials E. The main text of the paper does not contain specific experimental setup details. |