High-Confidence Off-Policy (or Counterfactual) Variance Estimation

Authors: Yash Chandak, Shiv Shankar, Philip S. Thomas6939-6947

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental Study Inspired by real-world applications where OVE and HCOVE can be useful, we validate our proposed estimators empirically on two domains motivated by real-world applications. Here, we only provide a brief description about the experimental setup and the main results. Appendix G contains additional experimental details. Figure 3: Experimental results using 100 trials.
Researcher Affiliation Academia Yash Chandak, Shiv Shankar, Philip S. Thomas University of Massachusetts {ychandak, sshankar, pthomas}@cs.umass.edu
Pseudocode Yes Algorithm 1: Variance-Reduced Off-Policy Variance Estimator
Open Source Code No The paper cites an “open-source implementation (Xie 2019) of the FDA approved Type-1 Diabetes Mellitus simulator (T1DMS)”, but does not provide a link or statement about the authors’ own source code for the methodology described in the paper.
Open Datasets Yes Diabetes treatment: This domain is based on an opensource implementation (Xie 2019) of the FDA approved Type-1 Diabetes Mellitus simulator (T1DMS) (Man et al. 2014) for treatment of Type-1 Diabetes... Gridworld: We also consider a standard 4 4 Gridworld with stochastic transitions.
Dataset Splits No The paper mentions using trajectories and varying the number of trajectories in its experiments, but it does not provide specific details on how these trajectories were split into training, validation, or test sets, nor does it refer to predefined splits with citations.
Hardware Specification No The paper does not provide any specific details regarding the hardware used to conduct the experiments, such as CPU or GPU models, memory, or cloud computing specifications.
Software Dependencies Yes Simglucose v0.2.1 (2018). URL https://github. com/jxx123/simglucose.
Experiment Setup No The paper includes an “Experimental Study” section that describes the domains used but lacks specific details regarding hyperparameters (e.g., learning rate, batch size) or system-level training settings. It refers to “Appendix G” for additional details, but these are not present in the main text.