Model-Based Minimum Bayes Risk Decoding for Text Generation

Authors: Yuu Jinnai, Tetsuro Morimura, Ukyo Honda, Kaito Ariu, Kenshi Abe

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that MBMBR outperforms MBR in several text generation tasks, both with encoder-decoder models and with language models.
Researcher Affiliation Industry 1Cyber Agent, Tokyo, Japan. Correspondence to: Yuu Jinnai <jinnai yu@cyberagent.co.jp>.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/Cyber Agent A ILab/model-based-mbr.
Open Datasets Yes We use the WMT 19 dataset (Barrault et al., 2019).
Dataset Splits No The paper evaluates using the first 1000 inputs of each dataset or the entire test dataset (819 inputs for SAMSum), but it does not specify explicit train/validation/test dataset splits (e.g., percentages or exact counts for each split) for reproducibility.
Hardware Specification No The paper mentions models loaded with '8-bit precision' or '4-bit precision' to reduce memory consumption, but it does not explicitly provide specific hardware details such as GPU or CPU models, memory amounts, or cloud instance types used for running experiments.
Software Dependencies No The paper mentions 'Huggingface s Transformers library', 'sacre BLEU library', and 'evaluate library' but does not provide specific version numbers for these software components.
Experiment Setup Yes The parameters for the sampling methods are set according to the work of Freitag et al. (2023). For epsilon sampling, ϵ = 0.02. k is set to k = 10 for top-k sampling. For nucleus sampling, p = 0.9 is set. The temperature is set to 1.0 for all algorithms.