MorphTE: Injecting Morphology in Tensorized Embeddings
Authors: Guobing Gan, Peng Zhang, Sunzhu Li, Xiuqing Lu, Benyou Wang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on tasks such as machine translation and question answering. Experimental results on four translation datasets of different languages show that Morph TE can compress word embedding parameters by about 20 times without performance loss and significantly outperforms related embedding compression methods. We conducted comparative experiments on machine translation, retrieval-based question answering, and natural language inference tasks. |
| Researcher Affiliation | Academia | 1College of Intelligence and Computing, Tianjin University, Tianjin, China 2School of Data Science, The Chinese University of Hong Kong, Shenzhen, China 3Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, China |
| Pseudocode | No | The paper includes mathematical formulas and a workflow diagram (Figure 3), but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code available at URL: https://github.com/bigganbing/Fairseq_MorphTE |
| Open Datasets | Yes | For machine translation tasks, we chose IWSLT 14 German-to-English (De-En) dataset [7], English-to-Italian (En-It), English-to-Spanish (En-Es), and English-to-Russian (En-Ru) datasets of OPUS-100 [45]. |
| Dataset Splits | No | The paper mentions datasets like IWSLT 14 De-En (160K sentence pairs) and OPUS-100 datasets (1M sentence pairs) and states training for '30 epochs with the early stopping' (for QA and NLI tasks), which implies the use of a validation set. However, it does not provide explicit training/validation/test split percentages or sample counts for any of the datasets used. |
| Hardware Specification | Yes | It is trained with a batch size of 4096 tokens on a NVIDIA Tesla V100 GPU. They are trained with a batch size of 32768 tokens on 2 NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions using 'Fairseq [28]' for Transformer implementation but does not provide specific version numbers for Fairseq or any other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | For De-En dataset, the Transformer consists of 6-layer encoder and 6-layer decoder with 512 embedding size, 1024 feed-forward network (FFN) size. It is trained with a batch size of 4096 tokens... For En-It, En-Es, and En-Ru tasks, the FFN size is increased to 2048. They are trained with a batch size of 32768 tokens... For question answering and NLI tasks, the word embedding size is set to 512, and we trained them for 30 epochs with the early stopping. Unless otherwise specified, the hyperparameter order of Morph TE is 3 in our experiments. |