Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Learning to Search Better than Your Teacher
Authors: Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daumé III, John Langford
ICML 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments This section shows that LOLS is able to improve upon a suboptimal reference policy and provides empirical evidence to support the analysis in Section 3. We conducted experiments on the following three applications. |
| Researcher Affiliation | Collaboration | Kai-Wei Chang EMAIL University of Illinois at Urbana Champaign, IL Akshay Krishnamurthy EMAIL Carnegie Mellon University, Pittsburgh, PA Alekh Agarwal EMAIL Microsoft Research, New York, NY Hal Daum e III EMAIL University of Maryland, College Park, MD John Langford EMAIL Microsoft Research, New York, NY |
| Pseudocode | Yes | Algorithm 1: Locally Optimal Learning to Search (LOLS) Algorithm 2: Structured Contextual Bandit Learning |
| Open Source Code | No | Our implementation is based on Vowpal Wabbit6, a machine learning system that supports online learning and L2S. Footnote 6 points to http://hunch.net/ vw/. This is a third-party tool used by the authors, not their own source code for the methodology. |
| Open Datasets | Yes | The experiments are conducted on KDDCup 99 dataset5 generated from a computer network intrusion detection task. (Footnote 5: http://kdd.ics.uci.edu/databases/ kddcup99/kddcup99.html) We train on 38k sentences and test on 11k from the Penn Treebank (Marcus et al., 1993). We used data from the Penn Treebank Wall Street Journal corpus: the standard data split for training (sections 02-21) and test (section 23). |
| Dataset Splits | Yes | The dataset contains 5 classes, 4, 898, 431 training and 311, 029 test instances. We train on 38k sentences and test on 11k from the Penn Treebank (Marcus et al., 1993). We used data from the Penn Treebank Wall Street Journal corpus: the standard data split for training (sections 02-21) and test (section 23). |
| Hardware Specification | No | The paper does not provide specific hardware details (like GPU/CPU models or memory) used for running the experiments. It only mentions using 'Vowpal Wabbit' which is a software system. |
| Software Dependencies | No | The paper mentions 'Vowpal Wabbit' as the base for their implementation but does not specify its version number or any other software dependencies with version numbers. |
| Experiment Setup | Yes | For LOLS s mixture policy, we set β = 0.5. For SEARN, we set the mixture parameter to be 1 (1 α)t, where t is the number of rounds and α = 10 5. Unless stated otherwise all the learners take 5 passes over the data. |