MorphTE: Injecting Morphology in Tensorized Embeddings

Authors: Guobing Gan, Peng Zhang, Sunzhu Li, Xiuqing Lu, Benyou Wang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on tasks such as machine translation and question answering. Experimental results on four translation datasets of different languages show that Morph TE can compress word embedding parameters by about 20 times without performance loss and significantly outperforms related embedding compression methods. We conducted comparative experiments on machine translation, retrieval-based question answering, and natural language inference tasks.
Researcher Affiliation Academia 1College of Intelligence and Computing, Tianjin University, Tianjin, China 2School of Data Science, The Chinese University of Hong Kong, Shenzhen, China 3Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, China
Pseudocode No The paper includes mathematical formulas and a workflow diagram (Figure 3), but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code available at URL: https://github.com/bigganbing/Fairseq_MorphTE
Open Datasets Yes For machine translation tasks, we chose IWSLT 14 German-to-English (De-En) dataset [7], English-to-Italian (En-It), English-to-Spanish (En-Es), and English-to-Russian (En-Ru) datasets of OPUS-100 [45].
Dataset Splits No The paper mentions datasets like IWSLT 14 De-En (160K sentence pairs) and OPUS-100 datasets (1M sentence pairs) and states training for '30 epochs with the early stopping' (for QA and NLI tasks), which implies the use of a validation set. However, it does not provide explicit training/validation/test split percentages or sample counts for any of the datasets used.
Hardware Specification Yes It is trained with a batch size of 4096 tokens on a NVIDIA Tesla V100 GPU. They are trained with a batch size of 32768 tokens on 2 NVIDIA Tesla V100 GPUs.
Software Dependencies No The paper mentions using 'Fairseq [28]' for Transformer implementation but does not provide specific version numbers for Fairseq or any other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes For De-En dataset, the Transformer consists of 6-layer encoder and 6-layer decoder with 512 embedding size, 1024 feed-forward network (FFN) size. It is trained with a batch size of 4096 tokens... For En-It, En-Es, and En-Ru tasks, the FFN size is increased to 2048. They are trained with a batch size of 32768 tokens... For question answering and NLI tasks, the word embedding size is set to 512, and we trained them for 30 epochs with the early stopping. Unless otherwise specified, the hyperparameter order of Morph TE is 3 in our experiments.