reproducibilityindex.ai

Cross-Lingual Natural Language Generation via Pre-Training

Authors: Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xian-Ling Mao, Heyan Huang7570-7577

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on question generation and abstractive summarization show that our model outperforms the machine-translation-based pipeline methods for zero-shot cross-lingual generation. Moreover, crosslingual transfer improves NLG performance of low-resource languages by leveraging rich-resource language data.
Researcher Affiliation	Collaboration	Beijing Institute of Technology Microsoft Research {czw, maoxl, hhy63}@bit.edu.cn {lidong1, fuwei, Wenhui.Wang}@microsoft.com
Pseudocode	No	The paper does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Our implementation and data are available at https://github.com/ CZWin32768/xnlg.
Open Datasets	Yes	We use SQu AD 1.1 (Rajpurkar et al. 2016) as the English QG dataset. For Chinese QG, we follow the default data splits of Web QA (Li et al. 2016). We use Wikipedia as the monolingual data for the DAE objective, and Multi UN (Ziemski, Junczys-Dowmunt, and Pouliquen 2016) as the parallel data for the XAE objective.
Dataset Splits	Yes	We use SQu AD 1.1 (Rajpurkar et al. 2016) as the English QG dataset. It is a popular English question answering dataset containing over 100,000 questions and their corresponding annotated passages. Following (Zhao et al. 2018), we regard the original development set as the test set, and sample 5000 examples from the training data of two datasets as the development sets. For Chinese QG, we follow the default data splits of Web QA (Li et al. 2016). ... For each language, we sample 500k/5k/5k examples for training/validation/test.
Hardware Specification	Yes	It takes about 30 hours to run 23,000 steps for the pre-training procedure by using 4 Nvidia Telsa V100-16GB GPUs.
Software Dependencies	No	The paper mentions using 'the tokenizer provided by (Chang, Galley, and Manning 2008) for Chinese, and Moses1 for other languages, respectively.' and 'subword vocabulary learned by BPE (Sennrich, Haddow, and Birch 2015)'. While these are software-related, no specific version numbers for these tools or other common libraries like PyTorch, TensorFlow, etc., are provided, which are necessary for full reproducibility.
Experiment Setup	Yes	We use Adam optimizer with a linear warm-up over the ﬁrst 4,000 steps and linear decay for later steps, and the learning rate is set to 10 4. The pre-training batch size is 64, and the sequence length is set to 256. ... For ﬁne-tuning on downstream NLG tasks, we use Adam optimizer with a learning rate of 5 10 6. We set the batch size as 16 and 32 for question generation and abstractive summarization, respectively. When the target language is the same as the language of training data, we ﬁne-tune all parameters. When the target language is different from the language of training data, we ﬁne-tune the Transformer layers of the encoder. ... During decoding, we use beam search with a beam size of 3, and limit the length of the target sequence to 80 tokens.