Data-dependent Gaussian Prior Objective for Language Generation
Authors: Zuchao Li, Rui Wang, Kehai Chen, Masso Utiyama, Eiichiro Sumita, Zhuosheng Zhang, Hai Zhao
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the proposed method makes effective use of a more detailed prior in the data and has improved performance in typical language generation tasks, including supervised and unsupervised machine translation, text summarization, storytelling, and image captioning. (Abstract) |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 3Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University 4National Institute of Information and Communications Technology (NICT), Kyoto, Japan |
| Pseudocode | No | The paper describes the proposed method using mathematical equations and prose but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or a link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We evaluated the model on several widely used translation tasks: WMT14 English-to-German (EN DE), English-to-French (EN FR), and WMT16 English-to-Romanian (EN RO)... (Section 5.2) The Annotated Gigaword corpus (Napoles et al., 2012) was used as the benchmark... (Section 5.4) ...on the MSCOCO 2014 caption dataset (Lin et al., 2014)... (Section 5.6) |
| Dataset Splits | Yes | The newstest2013 and newstest2014 datasets were used as the dev set and test set, respectively. (A.3) The newstest2012 and newstest2013 datasets were combined for validation and newstest2014 was used as the test set... (A.3) The data include approximately 3.8M training samples, 400,000 validation samples, and 2000 test samples. (A.4) |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU specifications) used to run the experiments. |
| Software Dependencies | No | The paper mentions several software tools and libraries (e.g., fastText, Transformer NMT, multi-bleu.pl, ROUGE, SPICE, CIDEr, METEOR) but does not provide specific version numbers for these components, which would be necessary for reproducible software dependencies. |
| Experiment Setup | Yes | During training with our D2GPo, the value of the standard deviation of the KL diversity item λ was set to 0.1, and the softmax temperature was T = 2.0 in all experiments. (A.7) ...we carried out experiments on WMT14 EN-DE with the Transformer-base model as the baseline 2 and set λ as [0, 0.1, 0.2, 0.5, 1.0], T as [1.0, 2.0, 5.0, 10.0]. (A.7) |