Learning to Search Better than Your Teacher
Authors: Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daumé III, John Langford
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments This section shows that LOLS is able to improve upon a suboptimal reference policy and provides empirical evidence to support the analysis in Section 3. We conducted experiments on the following three applications. |
| Researcher Affiliation | Collaboration | Kai-Wei Chang KCHANG10@ILLINOIS.EDU University of Illinois at Urbana Champaign, IL Akshay Krishnamurthy AKSHAYKR@CS.CMU.EDU Carnegie Mellon University, Pittsburgh, PA Alekh Agarwal ALEKHA@MICROSOFT.COM Microsoft Research, New York, NY Hal Daum e III HAL@UMIACS.UMD.EDU University of Maryland, College Park, MD John Langford JCL@MICROSOFT.COM Microsoft Research, New York, NY |
| Pseudocode | Yes | Algorithm 1: Locally Optimal Learning to Search (LOLS) Algorithm 2: Structured Contextual Bandit Learning |
| Open Source Code | No | Our implementation is based on Vowpal Wabbit6, a machine learning system that supports online learning and L2S. Footnote 6 points to http://hunch.net/ vw/. This is a third-party tool used by the authors, not their own source code for the methodology. |
| Open Datasets | Yes | The experiments are conducted on KDDCup 99 dataset5 generated from a computer network intrusion detection task. (Footnote 5: http://kdd.ics.uci.edu/databases/ kddcup99/kddcup99.html) We train on 38k sentences and test on 11k from the Penn Treebank (Marcus et al., 1993). We used data from the Penn Treebank Wall Street Journal corpus: the standard data split for training (sections 02-21) and test (section 23). |
| Dataset Splits | Yes | The dataset contains 5 classes, 4, 898, 431 training and 311, 029 test instances. We train on 38k sentences and test on 11k from the Penn Treebank (Marcus et al., 1993). We used data from the Penn Treebank Wall Street Journal corpus: the standard data split for training (sections 02-21) and test (section 23). |
| Hardware Specification | No | The paper does not provide specific hardware details (like GPU/CPU models or memory) used for running the experiments. It only mentions using 'Vowpal Wabbit' which is a software system. |
| Software Dependencies | No | The paper mentions 'Vowpal Wabbit' as the base for their implementation but does not specify its version number or any other software dependencies with version numbers. |
| Experiment Setup | Yes | For LOLS s mixture policy, we set β = 0.5. For SEARN, we set the mixture parameter to be 1 (1 α)t, where t is the number of rounds and α = 10 5. Unless stated otherwise all the learners take 5 passes over the data. |