Distributional Offline Policy Evaluation with Predictive Error Guarantees
Authors: Runzhe Wu, Masatoshi Uehara, Wen Sun
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, we demonstrate the performance of FLE with two generative models, Gaussian mixture models and diffusion models. For the multi-dimensional reward setting, FLE with diffusion models is capable of estimating the complicated distribution of the return of a test policy. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Cornell University, Ithaca, NY, USA. |
| Pseudocode | Yes | Algorithm 1 Fitted Likelihood Estimation (FLE) for finitehorizon MDPs |
| Open Source Code | Yes | We release our code at https://github.com/ziqian2000/Fitted-Likelihood-Estimation. |
| Open Datasets | No | The paper describes the 'combination lock environment' and how 'The offline dataset is generated uniformly' for their experiments, which implies they created their own dataset from a described environment rather than using a pre-existing public dataset with explicit access information (link, DOI, formal citation). While the environment itself has been used in prior works, the specific generated dataset for *these* experiments is not provided. |
| Dataset Splits | No | The paper mentions splitting data into subsets for the convenience of analysis for their iterative algorithm ('For finite-horizon MDPs, we randomly and evenly split D into H subsets, D1, . . . , DH, for the convenience of analysis. Each subset contains n/H samples. For infinite-horizon MDPs, we split it into T subsets in the same way.'), but it does not describe standard train/validation/test dataset splits used for model evaluation in the traditional sense. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper mentions 'neural network' architectures and that 'Our implementation is based on DDPM (Ho et al., 2020)', but it does not list specific software or library names with their version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The paper provides extensive details on the experimental setup, including: 'we set ϵ = 1/7' for the test policy, 'r+ N(1, 0.12) and r N( 1, 0.12)' for reward functions, 'The categorical algorithm discretizes the range [ 1.5, 1.5] using 100 atoms' for method-specific settings. Appendix E is dedicated to 'Experiment Details', including 'Table 3. Hyperparameters for the combination lock environment', 'Table 4. Shared hyperparameters', 'Table 5. Hyperparameters for the categorical algorithm', 'Table 6. Hyperparameters for quantile Algorithm', 'Table 7. Hyperparameters for Diff-FLE', and 'Table 8. Hyperparameters for GMM-FLE', which list numerous hyperparameters like batch size, learning rates, number of iterations, and specific settings for each model. |