reproducibilityindex.ai

Distributional Offline Policy Evaluation with Predictive Error Guarantees

Authors: Runzhe Wu, Masatoshi Uehara, Wen Sun

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, we demonstrate the performance of FLE with two generative models, Gaussian mixture models and diffusion models. For the multi-dimensional reward setting, FLE with diffusion models is capable of estimating the complicated distribution of the return of a test policy.
Researcher Affiliation	Academia	1Department of Computer Science, Cornell University, Ithaca, NY, USA.
Pseudocode	Yes	Algorithm 1 Fitted Likelihood Estimation (FLE) for finitehorizon MDPs
Open Source Code	Yes	We release our code at https://github.com/ziqian2000/Fitted-Likelihood-Estimation.
Open Datasets	No	The paper describes the 'combination lock environment' and how 'The offline dataset is generated uniformly' for their experiments, which implies they created their own dataset from a described environment rather than using a pre-existing public dataset with explicit access information (link, DOI, formal citation). While the environment itself has been used in prior works, the specific generated dataset for these experiments is not provided.
Dataset Splits	No	The paper mentions splitting data into subsets for the convenience of analysis for their iterative algorithm ('For finite-horizon MDPs, we randomly and evenly split D into H subsets, D1, . . . , DH, for the convenience of analysis. Each subset contains n/H samples. For infinite-horizon MDPs, we split it into T subsets in the same way.'), but it does not describe standard train/validation/test dataset splits used for model evaluation in the traditional sense.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	The paper mentions 'neural network' architectures and that 'Our implementation is based on DDPM (Ho et al., 2020)', but it does not list specific software or library names with their version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The paper provides extensive details on the experimental setup, including: 'we set ϵ = 1/7' for the test policy, 'r+ N(1, 0.12) and r N( 1, 0.12)' for reward functions, 'The categorical algorithm discretizes the range [ 1.5, 1.5] using 100 atoms' for method-specific settings. Appendix E is dedicated to 'Experiment Details', including 'Table 3. Hyperparameters for the combination lock environment', 'Table 4. Shared hyperparameters', 'Table 5. Hyperparameters for the categorical algorithm', 'Table 6. Hyperparameters for quantile Algorithm', 'Table 7. Hyperparameters for Diff-FLE', and 'Table 8. Hyperparameters for GMM-FLE', which list numerous hyperparameters like batch size, learning rates, number of iterations, and specific settings for each model.