Target-Side Input Augmentation for Sequence to Sequence Generation
Authors: Shufang Xie, Ang Lv, Yingce Xia, Lijun Wu, Tao Qin, Tie-Yan Liu, Rui Yan
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments on various sequence generation tasks, including dialog generation, machine translation, and abstractive summarization. |
| Researcher Affiliation | Collaboration | Shufang Xie1 , Ang Lv1 , Yingce Xia2, Lijun Wu2, Tao Qin2, Tie-Yan Liu2, Rui Yan1 1Gaoling School of Artificial Intelligence, Remin University of China 2Microsoft Research Asia 1shufangxie@ruc.edu.cn, lvangupup@gmail.com, ruiyan@ruc.edu.cn 2{yingce.xia, lijuwu, taoqin, tyliu}@microsoft.com |
| Pseudocode | No | The paper describes the proposed algorithm in prose and mathematical equations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/TARGET-SIDE-DATA-AUG/ TSDASG. |
| Open Datasets | Yes | We conduct experiments on two commonly used dialog generation data sets: the Daily Dialog (Li et al., 2017) for single-turn dialog generation and Persona-Chat (Zhang et al., 2018) for multi-turn dialog generation. ... the IWSLT 14 English (EN) German (DE) data set (Cettolo et al., 2014) ... the WMT 14 EN DE dataset (Bojar et al., 2014) ... the CNN/DM (Hermann et al., 2015) news summary dataset. |
| Dataset Splits | Yes | We follow the script of Luo et al. (2018) to pre-process the Daily Dialog data, where the dialog is represented as request-response pairs. Then the first 80% pairs are used for training, the next 10% for validation, and the last 10% for test. |
| Hardware Specification | No | The paper mentions using a "Transformer network architecture" but does not specify any hardware components like GPU models, CPU types, or cloud computing instances used for experiments. |
| Software Dependencies | No | Our implementation is based on the Fair Seq framework (Ott et al., 2019). We use Transformer (Vaswani et al., 2017) network architecture in all experiments... We use Adam optimizer (Kingma & Ba, 2015)... We compute the BLEU score by the Moses script (Koehn et al., 2007)... We use the files2rouge tool to evaluate... No specific version numbers are provided for these software dependencies, only the names and relevant citations. |
| Experiment Setup | Yes | We use Transformer (Vaswani et al., 2017) network architecture in all experiments with different model sizes, which are adjusted according to the data size. During training, we use Adam optimizer (Kingma & Ba, 2015) with Adam β = (0.9, 0.98) and invert sqrt learning rate scheduler. Meanwhile, we used label smoothing of value 0.1... We use transformer small configuration for Daily Dialog dataset and transformer base for Persona-Chat dataset, where both the encoder and the decoder consist of six layers. The (Embed Dim, FFN Embed Dim) of those configurations are (512, 1024) and (512, 2048), respectively. Our results are generated by beam search with beam size 5. We compute the BLEU score by the Moses script (Koehn et al., 2007) with the same tokenizer used by previous works. |