Learning to Search Better than Your Teacher

Authors: Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daumé III, John Langford

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments This section shows that LOLS is able to improve upon a suboptimal reference policy and provides empirical evidence to support the analysis in Section 3. We conducted experiments on the following three applications.
Researcher Affiliation Collaboration Kai-Wei Chang KCHANG10@ILLINOIS.EDU University of Illinois at Urbana Champaign, IL Akshay Krishnamurthy AKSHAYKR@CS.CMU.EDU Carnegie Mellon University, Pittsburgh, PA Alekh Agarwal ALEKHA@MICROSOFT.COM Microsoft Research, New York, NY Hal Daum e III HAL@UMIACS.UMD.EDU University of Maryland, College Park, MD John Langford JCL@MICROSOFT.COM Microsoft Research, New York, NY
Pseudocode Yes Algorithm 1: Locally Optimal Learning to Search (LOLS) Algorithm 2: Structured Contextual Bandit Learning
Open Source Code No Our implementation is based on Vowpal Wabbit6, a machine learning system that supports online learning and L2S. Footnote 6 points to http://hunch.net/ vw/. This is a third-party tool used by the authors, not their own source code for the methodology.
Open Datasets Yes The experiments are conducted on KDDCup 99 dataset5 generated from a computer network intrusion detection task. (Footnote 5: http://kdd.ics.uci.edu/databases/ kddcup99/kddcup99.html) We train on 38k sentences and test on 11k from the Penn Treebank (Marcus et al., 1993). We used data from the Penn Treebank Wall Street Journal corpus: the standard data split for training (sections 02-21) and test (section 23).
Dataset Splits Yes The dataset contains 5 classes, 4, 898, 431 training and 311, 029 test instances. We train on 38k sentences and test on 11k from the Penn Treebank (Marcus et al., 1993). We used data from the Penn Treebank Wall Street Journal corpus: the standard data split for training (sections 02-21) and test (section 23).
Hardware Specification No The paper does not provide specific hardware details (like GPU/CPU models or memory) used for running the experiments. It only mentions using 'Vowpal Wabbit' which is a software system.
Software Dependencies No The paper mentions 'Vowpal Wabbit' as the base for their implementation but does not specify its version number or any other software dependencies with version numbers.
Experiment Setup Yes For LOLS s mixture policy, we set β = 0.5. For SEARN, we set the mixture parameter to be 1 (1 α)t, where t is the number of rounds and α = 10 5. Unless stated otherwise all the learners take 5 passes over the data.