Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Towards Explainable Evaluation Metrics for Machine Translation
Authors: Christoph Leiter, Piyawat Lertvittayakumjorn, Marina Fomicheva, Wei Zhao, Yang Gao, Steffen Eger
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this concept paper, we identify key properties as well as key goals of explainable machine translation metrics and provide a comprehensive synthesis of recent techniques, relating them to our established goals and properties. In this context, we also discuss the latest state-of-the-art approaches to explainable metrics based on generative models such as Chat GPT and GPT4. Finally, we contribute a vision of next-generation approaches, including natural language explanations. |
| Researcher Affiliation | Collaboration | Christoph Leiter EMAIL Natural Language Learning Group University of Mannheim B6 26, 68159 Mannheim, Germany Piyawat Lertvittayakumjorn EMAIL Imperial College London Marina Fomicheva EMAIL University of Sheffield Wei Zhao EMAIL University of Aberdeen Heidelberg Institute for Theoretical Studies Yang Gao EMAIL Royal Holloway, University of London Steffen Eger EMAIL University of Mannheim |
| Pseudocode | No | The paper describes various methods and approaches but does not contain any structured pseudocode or algorithm blocks. It is a survey paper summarizing existing techniques. |
| Open Source Code | No | The paper is a survey and conceptual paper; therefore, it does not present new methodology that would typically be accompanied by open-source code. There is no statement about releasing code or a link to a code repository for the work described in this paper. |
| Open Datasets | No | The paper is a conceptual and survey paper and does not conduct its own experiments. Therefore, no datasets are "used in the experiments" by this paper. It discusses various datasets used by other research papers in the field but does not present its own experimental results requiring a dataset. |
| Dataset Splits | No | The paper is a conceptual and survey paper and does not conduct its own experiments. Therefore, it does not specify any training/test/validation dataset splits. |
| Hardware Specification | No | The paper is a conceptual and survey paper and does not conduct its own experiments. Therefore, no specific hardware details are mentioned for running experiments. |
| Software Dependencies | No | The paper is a conceptual and survey paper and does not conduct its own experiments. Therefore, no specific software dependencies with version numbers are mentioned for replicating experiments. |
| Experiment Setup | No | The paper is a conceptual and survey paper and does not conduct its own experiments. Therefore, it does not provide specific experimental setup details such as hyperparameters or training configurations. |