reproducibilityindex.ai

Enabling Arbitrary Translation Objectives with Adaptive Tree Search

Authors: Wang Ling, Wojciech Stokowiec, Domenic Donato, Chris Dyer, Lei Yu, Laurent Sartran, Austin Matthews

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that our adaptive tree search algorithm ﬁnds outputs with substantially better model scores compared to beam search in autoregressive models, and compared to reranking techniques in models whose scores do not decompose additively with respect to the words in the output. We also characterise the correlation of several translation model objectives with respect to BLEU. We conduct our experiments on the Chinese English and Pashto English tasks from WMT2020 (Barrault et al., 2020), and German English from WMT2014 (Bojar et al., 2014), following the same training, development and test splits. Table 1 illustrates the translation quality results using BATS and beam search.
Researcher Affiliation	Industry	Wang Ling Talka, Inc. lingwang@talka.ai Wojciech Stokowiec , Domenic Donato, Laurent Sartran, Lei Yu & Chris Dyer Deep Mind, Ltd. {wstokowiec,domenicd,lsartran,leiyu,cdyer} @deepmind.com Austin Matthews Amazon.com tinaus@amazon.com
Pseudocode	No	The paper describes the algorithms (BATS, MCTS) in detail using text and mathematical formulas (e.g., Equation 1, Equation 2, Equation 3), but it does not include a distinct pseudocode block or algorithm listing.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets	Yes	We conduct our experiments on the Chinese English and Pashto English tasks from WMT2020 (Barrault et al., 2020), and German English from WMT2014 (Bojar et al., 2014), following the same training, development and test splits.
Dataset Splits	Yes	We conduct our experiments on the Chinese English and Pashto English tasks from WMT2020 (Barrault et al., 2020), and German English from WMT2014 (Bojar et al., 2014), following the same training, development and test splits. We choose the checkpoint that yields the highest BLEU in the validation set using beam search with the normalisation constant α = 0.8 and beam size 6.
Hardware Specification	No	The paper mentions general aspects like "computation of a neural network forward step" but does not specify any particular hardware (e.g., GPU models, CPU types, or cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions software components like "Transformer XL architecture" and "sacre BLEU" but does not provide specific version numbers for these or other software dependencies, which would be necessary for reproducibility.
Experiment Setup	Yes	Our autoregressive model transformer baseline uses the multiquery attention model (Shazeer, 2019). It uses the standard architecture with 6 encoder and decoder layers with 512 hidden units, 2048 sized tied embeddings for both source and target word projections and 8 attention heads. We tokenize the data with byte-pair encoding (Sennrich et al., 2016) with 32K merges and set a maximum sentence size of Y = 128. For BATS, we simply set hyperparameter C = 1. The translation budget (beam size for beam search and iterations for BATS) is swept by doubling its value starting from 1 to 256.