A Tensorized Transformer for Language Modeling

Authors: Xindian Ma, Peng Zhang, Shuai Zhang, Nan Duan, Yuexian Hou, Ming Zhou, Dawei Song

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test and verify the proposed attention method on three language modeling tasks (i.e., PTB, Wiki Text-103 and Onebillion) and a neural machine translation task (i.e., WMT-2016 English-German). Multi-linear attention can not only largely compress the model parameters but also obtain performance improvements, compared with a number of language modeling approaches, such as Transformer, Transformer-XL, and Transformer with tensor train decomposition.
Researcher Affiliation Collaboration Xindian Ma1, Peng Zhang1 , Shuai Zhang1, Nan Duan2, Yuexian Hou1, Dawei Song3, Ming Zhou2 1College of Intelligence and Computing, Tianjin University, Tianjin, China 2Microsoft Research Asia, Beijing, China 3School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code3 for running experiments has been released, and the key code which is about our method can be found in Supplementary Materials F. . . . 3https://github.com/szhangtju/The-compression-of-Transformer
Open Datasets Yes We chose three datasets in the order of small (i.e., PTB), medium (i.e., Wiki Text-103) and large (i.e., One-Billion). . . . In this task, we have trained the Transformer model [35] on WMT 2016 English-German dataset [31].
Dataset Splits Yes PTB has 929k training tokens, 73k validation words, and 82k test words.
Hardware Specification No Other details (such as hyperparameters and Hardware) can be found in Supplementary Materials E. The main paper does not explicitly describe specific hardware details.
Software Dependencies No Other details (such as hyperparameters and Hardware) can be found in Supplementary Materials E. The main paper does not explicitly list specific software dependencies with version numbers. It mentions 'Sentence Piece 4' but this is a tool used, not the general software environment.
Experiment Setup No Other details (such as hyperparameters and Hardware) can be found in Supplementary Materials E. The main text of the paper does not contain specific experimental setup details.