Adaptive Estimator Selection for Off-Policy Evaluation
Authors: Yi Su, Pavithra Srinath, Akshay Krishnamurthy
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also perform comprehensive experiments, demonstrating the empirical efficacy of our approach and comparing with related approaches. |
| Researcher Affiliation | Collaboration | 1Cornell University, Ithaca, NY 2Microsoft Research, New York, NY. |
| Pseudocode | No | The paper describes the SLOPE procedure in prose within the text, but it does not provide a clearly labeled 'Pseudocode' or 'Algorithm' block or figure. |
| Open Source Code | Yes | Code for this section is publicly available at https:// github.com/Vowpal Wabbit/slope-experiments.Code for this section is available at https://github. com/clvoloshin/OPE-tools. |
| Open Datasets | Yes | We use four RL environments: Mountain Car, Gridworld, Graph, and Graph-POMDP (abbreviated MC, GW, Graph, and PO-Graph). All four environments are from Voloshin et al. (2019) |
| Dataset Splits | No | The paper describes conducting experiments with a certain number of replicates and generating data for each condition (e.g., 'We perform 30 replicates of each condition'), but it does not specify explicit training, validation, or test dataset splits in percentages or absolute counts for a fixed dataset. |
| Hardware Specification | No | The paper does not specify any particular hardware components such as GPU or CPU models, memory, or cloud instance types used for running experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software libraries, frameworks, or dependencies used in the experiments (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | We consider 7 different choices of geometrically spaced bandwidths H := {2 i : i 2 [7]}. For SLOPE, we simplify the implementation by replacing the confidence function in (2), with twice the empirical standard deviation of the corresponding estimate. We also manually enforce monotonicity of this confidence function. |