Sentence-Wise Smooth Regularization for Sequence to Sequence Learning
Authors: Chengyue Gong, Xu Tan, Di He, Tao Qin6449-6456
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three neural machine translation tasks and one text summarization task show that our method outperforms conventional MLE loss on all these tasks and achieves promising BLEU scores on WMT14 English-German and WMT17 Chinese-English translation task. |
| Researcher Affiliation | Collaboration | Peking University Microsoft Research {xuta,taoqin}@microsoft.com, {cygong,di he}@pku.edu.cn |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper discusses third-party open-source code (tensor2tensor, Transformer) but does not provide concrete access to the authors' own source code for the methodology described in this paper. |
| Open Datasets | Yes | We use the concatenation of newstest2012 and newstest2013 as the validation set and newstest2014 as the test set. WMT17 Chinese-English Translation We filter the full dataset of 24M bilingual sentence pairs by removing duplications and get 19M sentence pairs for training. The source and target sentences are encoded using 40K and 37K BPE tokens. We report the results on the official newstest2017 test set and use newsdev2017 as the validation set. IWSLT14 German-English Translation This dataset contains 160K training sentence pairs and 7K validation sentence pairs following (Cettolo et al. 2014). ... Text Summarization We use the English Gigaword Fifth Edition (Graff et al. 2003) corpus for text summarization, which contains 10M news articles with the corresponding headlines. |
| Dataset Splits | Yes | We use the concatenation of newstest2012 and newstest2013 as the validation set and newstest2014 as the test set. For WMT17 Chinese-English task... use newsdev2017 as the validation set. For IWSLT14 German-English task, ...7K validation sentence pairs following (Cettolo et al. 2014). ...for text summarization, which contains ... 190K pairs for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software tools like tensor2tensor, NLTK, and specific BLEU/ROUGE scripts, but does not provide specific version numbers for these software dependencies required for replication. |
| Experiment Setup | Yes | We choose Transformer (Vaswani et al. 2017) as our basic model... Transformer base and transformer big configurations... 6-layer encoder and a 6-layer decoder, with 512-dimensional and 1024-dimensional hidden representations... the sliding window k of subsequences in our loss is set to 4 and λ is set to 0.6... Adam optimizer with β1 = 0.9, β2 = 0.98, ε = 10 9. We follow the learning rate schedule in (Vaswani et al. 2017). During inference, ... beam search with beam size 6 and length penalty 1.0... beam size to 6 and length penalty to 1.1. |