QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation

Authors: Gonçalo Faria, Sweta Agrawal, António Farinhas, Ricardo Rei, José de Souza, André Martins

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The results show that our proposed method leads to high-quality and diverse outputs across multiple language pairs (ENGLISH {GERMAN, RUSSIAN}) with two strong decoder-only LLMs (ALMA-7B, TOWER-7B).
Researcher Affiliation Collaboration Gonçalo R. A. Faria1 , Sweta Agrawal2, António Farinhas2,3, Ricardo Rei4, José G. C. de Souza4, André F.T. Martins2,3,4,5 1University of Washington, 2Instituto de Telecomunicações, 3Instituto Superior Técnico, Universidade de Lisboa, 4Unbabel, 5ELLIS Unit Lisbon
Pseudocode Yes Algorithm 1 Quality-Aware Metropolis Hastings (QUEST) Sampling
Open Source Code Yes We release the code to replicate our experiments at https://www.questdecoding.com.
Open Datasets Yes We test our approach on the WMT23 test sets (Kocmi et al., 2023) covering four language pairs, ENGLISH {GERMAN, RUSSIAN}.
Dataset Splits No The paper specifies the use of 'WMT23 test sets' but does not explicitly detail training, validation, and test splits with percentages or counts. It implies the use of a test set.
Hardware Specification Yes We run our experiments on NVIDIA RTX A6000 GPUs.
Software Dependencies No The paper mentions using 'VLLM (Kwon et al., 2023)' for inference but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes Decoding Configurations For ancestral sampling, we consider temperature values τ between 0.2 and 1.0, with an equally spaced interval of 0.1. For generations with QUEST, we sample from the proposal distribution using τ = 0.8 and vary the parameter β of the target Gibbs distribution from the following range of values {0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1.0}. The number of ancestral samples and decoding steps are both set to 128.