Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Model-Based Minimum Bayes Risk Decoding for Text Generation
Authors: Yuu Jinnai, Tetsuro Morimura, Ukyo Honda, Kaito Ariu, Kenshi Abe
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that MBMBR outperforms MBR in several text generation tasks, both with encoder-decoder models and with language models. |
| Researcher Affiliation | Industry | 1Cyber Agent, Tokyo, Japan. Correspondence to: Yuu Jinnai <jinnai EMAIL>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/Cyber Agent A ILab/model-based-mbr. |
| Open Datasets | Yes | We use the WMT 19 dataset (Barrault et al., 2019). |
| Dataset Splits | No | The paper evaluates using the first 1000 inputs of each dataset or the entire test dataset (819 inputs for SAMSum), but it does not specify explicit train/validation/test dataset splits (e.g., percentages or exact counts for each split) for reproducibility. |
| Hardware Specification | No | The paper mentions models loaded with '8-bit precision' or '4-bit precision' to reduce memory consumption, but it does not explicitly provide specific hardware details such as GPU or CPU models, memory amounts, or cloud instance types used for running experiments. |
| Software Dependencies | No | The paper mentions 'Huggingface s Transformers library', 'sacre BLEU library', and 'evaluate library' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | The parameters for the sampling methods are set according to the work of Freitag et al. (2023). For epsilon sampling, ϵ = 0.02. k is set to k = 10 for top-k sampling. For nucleus sampling, p = 0.9 is set. The temperature is set to 1.0 for all algorithms. |