reproducibilityindex.ai

Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models

Authors: Luke Vilnis, Yury Zemlyanskiy, Patrick Murray, Alexandre Tachard Passos, Sumit Sanghai

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our approach on WMT machine translation, more than halving the standard deviation when estimating expected BLEU score reward, and closing the BLEU score gap between independent sampling and beam search by up to 63%. We perform several experiments on sequence-to-sequence models trained for machine translation (WMT14 English-French and WMT16 English-Romanian) and summarization (CNN Daily Mail) tasks following a setup similar to (Raffel et al., 2020).
Researcher Affiliation	Industry	1Work done while all authors were at Google Research.. Correspondence to: Luke Vilnis <lvilnis@google.com>, Yury Zemlyanskiy <urikz@google.com>.
Pseudocode	Yes	Algorithm 1 Sampling from a Code Point
Open Source Code	Yes	We release an open-source implementation of our algorithm1 in the popular T5X transformer library (Roberts et al., 2022). 1Code is available at https://github.com/google-research/google-research/tree/master/arithmetic_sampling
Open Datasets	Yes	We perform several experiments on sequence-to-sequence models trained for machine translation (WMT14 English-French and WMT16 English-Romanian) and summarization (CNN Daily Mail) tasks following a setup similar to (Raffel et al., 2020).
Dataset Splits	No	The paper mentions using standard datasets but does not explicitly provide training/validation/test dataset splits, only mentions the test set.
Hardware Specification	Yes	To investigate the parallel properties of arithmetic sampling on real hardware, we use the publicly available mt5 XXL model in the T5X library (Roberts et al., 2022), and between 1 and 8 Google Cloud TPU v4 accelerator chips with 32GB memory each, arranged in a 3D toroidal topology of either 1x1, 1x2x1, 2x2x1, or 2x2x2.
Software Dependencies	No	The paper mentions using the T5X library but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup	Yes	We do 260,000 ﬁne-tuning steps with batch size 128. We control diversity using the softmax temperature parameter T = 0.1, 0.2, ..., 0.8. We vary the temperature parameter T = 0.1, 0.2, 0.5 and the sample size from 2 to 64.