Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning
Authors: Alberto Maria Metelli, Alessio Russo, Marcello Restelli
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we provide numerical simulations on both synthetic examples and contextual bandits, in comparison with off-policy evaluation and learning baselines. |
| Researcher Affiliation | Academia | Alberto Maria Metelli DEIB, Politecnico di Milano albertomaria.metelli@polimi.it Alessio Russo DEIB, Politecnico di Milano alessio.russo@polimi.it Marcello Restelli DEIB, Politecnico di Milano marcello.restelli@polimi.it |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is provided at https://github.com/albertometelli/subgaussian-is. |
| Open Datasets | Yes | We consider 11 UCI [13] multi-class classification datasets (see Table 9 in Appendix B.1.2). |
| Dataset Splits | Yes | Each dataset is split into a training set Dtrain and an evaluation Deval with proportions 30% and 70%. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python version, library versions) that would be needed for reproducibility. |
| Experiment Setup | Yes | Specifically, we consider a Gaussian behavioral policy Q NpµQ,σ2 Qq and a Gaussian target policy P NpµP ,σ2 P q. We generate n i.i.d. samples from Q and we estimate the expectation of function fpyq 100cosp2πyq under P. We select µQ 0, µP 0.5, σ2 Q 1 and σ2 P 1.9 [...] The behavioral policy is obtained as: πbpa|xq αb 1 αb K if a Cpxq and πbpa|xq 1 αb K otherwise, where αb Pr0,1s. The target policy πe is obtained as the behavioral one by training another classifier on Dtrain and using αe Pr0,1s. |