Towards Robust Off-Policy Learning for Runtime Uncertainty
Authors: Da Xu, Yuting Ye, Chuanwei Ruan, Bo Yang10101-10109
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive simulation studies to examine the effectiveness of the proposed approach. We also conduct real-world online testings on an e-commerce platform, where our approach compares favorably to standard offline learning. |
| Researcher Affiliation | Collaboration | 1 Walmart Labs 2 University of California, Berkeley 3 Instacart 4 Linked In |
| Pseudocode | Yes | Algorithm 1: Robust Off-policy Learning with DR |
| Open Source Code | No | The paper does not include any statement about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We use the same benchmark datasets from the UCI repository as in (Dudík, Langford, and Li 2011; Vlassis et al. 2019) |
| Dataset Splits | Yes | do the train-validation-test split detailed in the appendix |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of "standard Regression Tree" but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | The model family of the RM estimator (and the RM part of DR), as well as the bounding functions ˆfa and ˆga, are given by the standard Regression Tree. The tuning and other implementation details are left in the appendix. ... We design the logging policy as: π0(a|x) θ ax for all a = 1, . . . , k, where θa are sampled i.i.d from the standard multivariate normal distribution. ... we add noise to π and obtain the uncertainty-injected policy from which the feedback data is actually generated: π(a|x) := π(a|x) Ua,x(α) Pa π( a|x) U a,x(α), where Ua,x(α) is sampled from the truncated normal distribution with unit variance and mean γ ax, where γa is also sampled from standard multivariate normal distributions. We set the truncation interval to be 0, exp(α). |