reproducibilityindex.ai

A Tensorized Transformer for Language Modeling

Authors: Xindian Ma, Peng Zhang, Shuai Zhang, Nan Duan, Yuexian Hou, Ming Zhou, Dawei Song

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test and verify the proposed attention method on three language modeling tasks (i.e., PTB, Wiki Text-103 and Onebillion) and a neural machine translation task (i.e., WMT-2016 English-German). Multi-linear attention can not only largely compress the model parameters but also obtain performance improvements, compared with a number of language modeling approaches, such as Transformer, Transformer-XL, and Transformer with tensor train decomposition.
Researcher Affiliation	Collaboration	Xindian Ma1, Peng Zhang1 , Shuai Zhang1, Nan Duan2, Yuexian Hou1, Dawei Song3, Ming Zhou2 1College of Intelligence and Computing, Tianjin University, Tianjin, China 2Microsoft Research Asia, Beijing, China 3School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code3 for running experiments has been released, and the key code which is about our method can be found in Supplementary Materials F. . . . 3https://github.com/szhangtju/The-compression-of-Transformer
Open Datasets	Yes	We chose three datasets in the order of small (i.e., PTB), medium (i.e., Wiki Text-103) and large (i.e., One-Billion). . . . In this task, we have trained the Transformer model [35] on WMT 2016 English-German dataset [31].
Dataset Splits	Yes	PTB has 929k training tokens, 73k validation words, and 82k test words.
Hardware Specification	No	Other details (such as hyperparameters and Hardware) can be found in Supplementary Materials E. The main paper does not explicitly describe specific hardware details.
Software Dependencies	No	Other details (such as hyperparameters and Hardware) can be found in Supplementary Materials E. The main paper does not explicitly list specific software dependencies with version numbers. It mentions 'Sentence Piece 4' but this is a tool used, not the general software environment.
Experiment Setup	No	Other details (such as hyperparameters and Hardware) can be found in Supplementary Materials E. The main text of the paper does not contain specific experimental setup details.