Balanced Policy Evaluation and Learning
Authors: Nathan Kallus
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that this approach markedly outperforms existing ones both in evaluation and learning, which is unsurprising given the wider support of balancebased weights. We establish extensive theoretical consistency guarantees and regret bounds that support this empirical success. Our empirical results show the stark benefit of this approach while our main theoretical results (Thm. 6, Cor. 7) establish vanishing regret bounds. |
| Researcher Affiliation | Academia | Nathan Kallus Cornell University and Cornell Tech kallus@cornell.edu |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide any concrete access information for open-source code (e.g., repository links, explicit code release statements, or mention of code in supplementary materials) for the methodology described. |
| Open Datasets | Yes | Next, we consider two UCI multi-class classification datasets [30], Glass (n = 214, d = 9, m = 6) and Ecoli (n = 336, d = 7, m = 8). [30] M. Lichman. UCI machine learning repository, 2013. URL http://archive.ics.uci.edu/ |
| Dataset Splits | No | Example 3 mentions 'And we split the data 75-25 into training and test sample.' but does not explicitly describe a separate validation split or specify how it was handled if implied. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | Yes | In practice, we solve these using Gurobi 7.0. |
| Experiment Setup | Yes | using untuned parameters (rather than fit by marginal likelihood) using the standard (s = 1) Mahalanobis RBF kernel for Kt, kfk2 = Pm Kt, and = I. and we fit ˆµ using m separate gradient-boosted tree models (sklearn defaults). |