QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation
Authors: Gonçalo Faria, Sweta Agrawal, António Farinhas, Ricardo Rei, José de Souza, André Martins
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The results show that our proposed method leads to high-quality and diverse outputs across multiple language pairs (ENGLISH {GERMAN, RUSSIAN}) with two strong decoder-only LLMs (ALMA-7B, TOWER-7B). |
| Researcher Affiliation | Collaboration | Gonçalo R. A. Faria1 , Sweta Agrawal2, António Farinhas2,3, Ricardo Rei4, José G. C. de Souza4, André F.T. Martins2,3,4,5 1University of Washington, 2Instituto de Telecomunicações, 3Instituto Superior Técnico, Universidade de Lisboa, 4Unbabel, 5ELLIS Unit Lisbon |
| Pseudocode | Yes | Algorithm 1 Quality-Aware Metropolis Hastings (QUEST) Sampling |
| Open Source Code | Yes | We release the code to replicate our experiments at https://www.questdecoding.com. |
| Open Datasets | Yes | We test our approach on the WMT23 test sets (Kocmi et al., 2023) covering four language pairs, ENGLISH {GERMAN, RUSSIAN}. |
| Dataset Splits | No | The paper specifies the use of 'WMT23 test sets' but does not explicitly detail training, validation, and test splits with percentages or counts. It implies the use of a test set. |
| Hardware Specification | Yes | We run our experiments on NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions using 'VLLM (Kwon et al., 2023)' for inference but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | Decoding Configurations For ancestral sampling, we consider temperature values τ between 0.2 and 1.0, with an equally spaced interval of 0.1. For generations with QUEST, we sample from the proposal distribution using τ = 0.8 and vary the parameter β of the target Gibbs distribution from the following range of values {0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1.0}. The number of ancestral samples and decoding steps are both set to 128. |