Evaluating Natural Language Generation via Unbalanced Optimal Transport
Authors: Yimeng Chen, Yanyan Lan, Ruinbin Xiong, Liang Pang, Zhiming Ma, Xueqi Cheng
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on WMT18 and WMT19 show that our proposed metrics have the ability to produce more consistent evaluation results with human judgements, as compared with existing intrinsic metrics. |
| Researcher Affiliation | Academia | Yimeng Chen1,3 , Yanyan Lan1,2 , Ruibin Xiong1,2 , Liang Pang1,2 , Zhiming Ma1,3 and Xueqi Cheng1,2 1University of Chinese Academy of Sciences 2CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, CAS 3Academy of Mathematics and Systems Science, CAS |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | Yes | Code is available at https://github.com/Beastlyprime/lazy-emd |
| Open Datasets | Yes | Our experiments are conducted on WMT18 [Ma et al., 2018] and WMT19 [Ma et al., 2019], two widely used machine translation datasets for evaluating NLG measures. |
| Dataset Splits | No | The paper does not explicitly provide specific training/test/validation dataset splits in the conventional machine learning sense for model training. While parameters are tuned on specific language pairs (et-en, en-zh, en-cs), these are not referred to as a general 'validation split' for a single model across all experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions "BERTScore v0.2.2" and "python package POT" but does not specify a version number for POT. This is not sufficient to provide specific version numbers for all key ancillary software. |
| Experiment Setup | Yes | The regularization parameter in the Sinkhorn-scaling algorithm is set as 0.009. The penalty parameters are set to be different for three data categories, based on the target language of the translation, i.e., English, Chinese and others. For English, the parameter is set to (0.23, 0.31), which is tuned on et-en in WMT18. For Chinese, the parameter is set as (0.018, 0.97), which is tuned on en-zh in WMT19. For other languages, the parameter is set as (0.009, 0.95), which is tuned on en-cs in WMT19. |