Policy Evaluation Using the Ω-Return
Authors: Philip S. Thomas, Scott Niekum, Georgios Theocharous, George Konidaris
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide empirical studies that suggest that it is superior to the λ-return and γ-return for a variety of problems. We propose a method for approximating the Ω-return, and show that it outperforms the λ and γ-returns on a range of off-policy evaluation problems. Figures 1(g), 2(g), 3(g), and 4(g) show the mean squared error (MSE) of value estimates when using various methods. |
| Researcher Affiliation | Collaboration | Philip S. Thomas University of Massachusetts Amherst Carnegie Mellon University Scott Niekum University of Texas at Austin Georgios Theocharous Adobe Research George Konidaris Duke University |
| Pseudocode | Yes | Pseudocode for approximating the Ω-return is provided in Algorithm 1. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code for the described methodology or links to a code repository. |
| Open Datasets | No | The paper mentions domain names like '5 × 5 gridworld', 'mountain car domain', 'digital marketing problem', and 'DAS1' but does not provide concrete access information (specific links, DOIs, repository names, or formal citations with authors/year) for these datasets. |
| Dataset Splits | No | The paper mentions varying numbers of trajectories used for estimating the covariance matrix (e.g., '5 trajectories', '10,000 trajectories') but does not specify explicit train/validation/test dataset splits, percentages, or cross-validation setup for reproducing the experiments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU/GPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or solvers) used for the experiments. |
| Experiment Setup | Yes | We select the k1 and k2 that minimize the mean squared error between ˆΩ(i, i) and vi, and set v+ and v L directly from the data. |