Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models

Authors: Luke Vilnis, Yury Zemlyanskiy, Patrick Murray, Alexandre Tachard Passos, Sumit Sanghai

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our approach on WMT machine translation, more than halving the standard deviation when estimating expected BLEU score reward, and closing the BLEU score gap between independent sampling and beam search by up to 63%. We perform several experiments on sequence-to-sequence models trained for machine translation (WMT14 English-French and WMT16 English-Romanian) and summarization (CNN Daily Mail) tasks following a setup similar to (Raffel et al., 2020).
Researcher Affiliation Industry 1Work done while all authors were at Google Research.. Correspondence to: Luke Vilnis <lvilnis@google.com>, Yury Zemlyanskiy <urikz@google.com>.
Pseudocode Yes Algorithm 1 Sampling from a Code Point
Open Source Code Yes We release an open-source implementation of our algorithm1 in the popular T5X transformer library (Roberts et al., 2022). 1Code is available at https://github.com/google-research/google-research/tree/master/arithmetic_sampling
Open Datasets Yes We perform several experiments on sequence-to-sequence models trained for machine translation (WMT14 English-French and WMT16 English-Romanian) and summarization (CNN Daily Mail) tasks following a setup similar to (Raffel et al., 2020).
Dataset Splits No The paper mentions using standard datasets but does not explicitly provide training/validation/test dataset splits, only mentions the test set.
Hardware Specification Yes To investigate the parallel properties of arithmetic sampling on real hardware, we use the publicly available mt5 XXL model in the T5X library (Roberts et al., 2022), and between 1 and 8 Google Cloud TPU v4 accelerator chips with 32GB memory each, arranged in a 3D toroidal topology of either 1x1, 1x2x1, 2x2x1, or 2x2x2.
Software Dependencies No The paper mentions using the T5X library but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup Yes We do 260,000 fine-tuning steps with batch size 128. We control diversity using the softmax temperature parameter T = 0.1, 0.2, ..., 0.8. We vary the temperature parameter T = 0.1, 0.2, 0.5 and the sample size from 2 to 64.