Efficient Policy Learning from Surrogate-Loss Classification Reductions
Authors: Andrew Bennett, Nathan Kallus
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate empirically over a wide range of scenarios that our methodology indeed leads to greater efficiency, with lower MSE in estimating the optimal policy parameter estimates under correct specification. Furthermore, we demonstrate that in practice, both with and without correct specification, our methodology tends to learn policies with lower regret, particularly in the low-data regime. |
| Researcher Affiliation | Academia | Andrew Bennett 1 Nathan Kallus 1 1Cornell University, and Cornell Tech, New York. Correspondence to: Andrew Bennett <awb222@cornell.edu>. |
| Pseudocode | No | The paper describes the algorithms verbally and mathematically but does not include a pseudocode block or a clearly labeled algorithm. |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | 5.2. Jobs Case Study We next consider an application to a dataset derived from a large scale experiment comparing different programs offered to unemployed individuals in France (Behaghel et al., 2014). |
| Dataset Splits | Yes | Of the training data, 20% was set aside for training nuisances, and an additional 20% as validation data for early stopping. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments, only mentions that experiments were conducted. |
| Software Dependencies | No | The paper mentions using Python and various learning algorithms (e.g., linear regression, logistic regression, neural networks) but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | a linear policy class, where g (x) = T x + 0, and a flexible policy class where g (x) is given by a fully-connected neural network with a single hidden layer of size 50, and leaky Re LU activations. |