Improving Text Generation with Dynamic Masking and Recovering

Authors: Zhidong Liu, Junhui Li, Muhua Zhu

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on several text generation tasks including machine translation (MT), AMR-to-text generation, and image captioning show that the proposed approach can significantly improve over competitive baselines without using any task-specific techniques.
Researcher Affiliation Collaboration Zhidong Liu1 , Junhui Li1 , Muhua Zhu2 1School of Computer Science and Technology, Soochow University, Suzhou, China 2Tencent News, Beijing, China
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper refers to Open NMT's GitHub repository as the implementation for the Transformer model they use as a baseline (footnote 4), but it does not provide a specific link or statement for the open-sourcing of their own proposed method ("Dynamic Masking and Recovering").
Open Datasets Yes For machine translation, we evaluate our approach on two widely used benchmarks: WMT14 English German (WMT14 EN-DE)1 and IWSLT14 German English (IWSLT14 DE-EN)2. ... Following previous studies on AMR-to-text, we use the benchmark dataset AMR2.0 (LDC2017T10)... For image caption generation, we experiment with the widely used dataset MSCOCO 2014 [Lin et al., 2014].
Dataset Splits Yes We use newstest2013 and newstest2014 as the validation and test set respectively. For IWSLT14 DE-EN, we conduct the same data cleanup and train/dev splitting as [Ott et al., 2019], resulting 160K parallel sentence pairs for training and 7,284 sentence pairs for development. ...which contains 36,521/1,368/1,371 training/development/testing sentences with corresponding AMR graphs. ...the popular Karpathy splitting [Karpathy and Fei-Fei, 2015] is adopted, which results in 113,287 images for training, 5K images for validation, and 5K images for testing.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions using "Open NMT [Klein et al., 2017] as the implementation of the Transformer model" (footnote 4 links to its GitHub), but it does not specify any version numbers for OpenNMT or other software dependencies like Python, PyTorch, or TensorFlow versions, which are necessary for reproducibility.
Experiment Setup Yes For the decoder layers, we set N to 6 and set M to 1 for all experiments, as shown in Figure 1. ... tuned on the respective development sets. ... The sizes of resulting vocabularies shared by the source and target language are 32K and 10K for respectively WMT14 EN-DE and IWSLT14 DE-EN. ... The dimension sizes of Transformer hidden states, image feature embeddings, and word embeddings are all set to 512. All sentences are truncated to contain at most 16 words during training.