Balanced Policy Evaluation and Learning

Authors: Nathan Kallus

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that this approach markedly outperforms existing ones both in evaluation and learning, which is unsurprising given the wider support of balancebased weights. We establish extensive theoretical consistency guarantees and regret bounds that support this empirical success. Our empirical results show the stark benefit of this approach while our main theoretical results (Thm. 6, Cor. 7) establish vanishing regret bounds.
Researcher Affiliation Academia Nathan Kallus Cornell University and Cornell Tech kallus@cornell.edu
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide any concrete access information for open-source code (e.g., repository links, explicit code release statements, or mention of code in supplementary materials) for the methodology described.
Open Datasets Yes Next, we consider two UCI multi-class classification datasets [30], Glass (n = 214, d = 9, m = 6) and Ecoli (n = 336, d = 7, m = 8). [30] M. Lichman. UCI machine learning repository, 2013. URL http://archive.ics.uci.edu/
Dataset Splits No Example 3 mentions 'And we split the data 75-25 into training and test sample.' but does not explicitly describe a separate validation split or specify how it was handled if implied.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies Yes In practice, we solve these using Gurobi 7.0.
Experiment Setup Yes using untuned parameters (rather than fit by marginal likelihood) using the standard (s = 1) Mahalanobis RBF kernel for Kt, kfk2 = Pm Kt, and = I. and we fit ˆµ using m separate gradient-boosted tree models (sklearn defaults).