Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning to Search Better than Your Teacher

Authors: Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daumé III, John Langford

ICML 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments This section shows that LOLS is able to improve upon a suboptimal reference policy and provides empirical evidence to support the analysis in Section 3. We conducted experiments on the following three applications.
Researcher Affiliation Collaboration Kai-Wei Chang EMAIL University of Illinois at Urbana Champaign, IL Akshay Krishnamurthy EMAIL Carnegie Mellon University, Pittsburgh, PA Alekh Agarwal EMAIL Microsoft Research, New York, NY Hal Daum e III EMAIL University of Maryland, College Park, MD John Langford EMAIL Microsoft Research, New York, NY
Pseudocode Yes Algorithm 1: Locally Optimal Learning to Search (LOLS) Algorithm 2: Structured Contextual Bandit Learning
Open Source Code No Our implementation is based on Vowpal Wabbit6, a machine learning system that supports online learning and L2S. Footnote 6 points to http://hunch.net/ vw/. This is a third-party tool used by the authors, not their own source code for the methodology.
Open Datasets Yes The experiments are conducted on KDDCup 99 dataset5 generated from a computer network intrusion detection task. (Footnote 5: http://kdd.ics.uci.edu/databases/ kddcup99/kddcup99.html) We train on 38k sentences and test on 11k from the Penn Treebank (Marcus et al., 1993). We used data from the Penn Treebank Wall Street Journal corpus: the standard data split for training (sections 02-21) and test (section 23).
Dataset Splits Yes The dataset contains 5 classes, 4, 898, 431 training and 311, 029 test instances. We train on 38k sentences and test on 11k from the Penn Treebank (Marcus et al., 1993). We used data from the Penn Treebank Wall Street Journal corpus: the standard data split for training (sections 02-21) and test (section 23).
Hardware Specification No The paper does not provide specific hardware details (like GPU/CPU models or memory) used for running the experiments. It only mentions using 'Vowpal Wabbit' which is a software system.
Software Dependencies No The paper mentions 'Vowpal Wabbit' as the base for their implementation but does not specify its version number or any other software dependencies with version numbers.
Experiment Setup Yes For LOLS s mixture policy, we set β = 0.5. For SEARN, we set the mixture parameter to be 1 (1 α)t, where t is the number of rounds and α = 10 5. Unless stated otherwise all the learners take 5 passes over the data.