Learning Beam Search Policies via Imitation Learning

Authors: Renato Negrinho, Matthew Gormley, Geoffrey J. Gordon

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our contributions are: an algorithm for learning beam search policies (Section 4.2) with accompanying regret guarantees (Section 5), a meta-algorithm that captures much of the existing literature (Section 4), and new theoretical results for the early update [6] and La SO [7] algorithms (Section 5.3).
Researcher Affiliation Collaboration 1Machine Learning Department, Carnegie Mellon University 2Microsoft Research
Pseudocode Yes Algorithm 1 Beam Search Algorithm 2 Meta-algorithm
Open Source Code No The paper does not include any statements about releasing code or links to a code repository.
Open Datasets No The paper defines 'Input-output training pairs D = {(x1, y1), . . . , (xm, ym)}' as part of its theoretical framework, but does not specify any publicly available datasets used for empirical training.
Dataset Splits No The paper mentions 'return best θt on validation' within its meta-algorithm (Algorithm 2), but does not provide specific details on empirical validation splits or methodology as it is a theoretical paper.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for experiments.
Software Dependencies No The paper mentions 'Adam [18]' as an online optimization algorithm but does not specify its version number or any other software dependencies with their versions.
Experiment Setup No The paper does not include specific experimental setup details such as hyperparameter values, training configurations, or system-level settings, consistent with its theoretical focus.