Importance Sampling Policy Evaluation with an Estimated Behavior Policy

Authors: Josiah Hanna, Scott Niekum, Peter Stone

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Empirical Results We present an empirical study of the RIS estimator across several policy evaluation tasks. Our experiments are designed to answer the following questions: ... Figure 2: Gridworld policy evaluation results. ... Figure 3: Off-policy evaluation in the Single Path MDP for various n. ... Figure 4: Linear dynamical system results. ... Figure 5: Figures 5(a) and 5(b) compare different neural network architectures ...
Researcher Affiliation Academia 1The University of Texas at Austin, Austin, Texas, USA.
Pseudocode No The paper describes methods through mathematical equations and prose but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code is provided at https://github.com/LARG/regression-importance-sampling.
Open Datasets No The paper mentions domains like 'Gridworld', 'Single Path', 'Linear Dynamical System', and 'Open AI gym: Hopper and Half Cheetah'. While some of these are well-known environments, the paper does not provide concrete access information (links, DOIs, specific citations with authors/year) for specific datasets used or generated from these environments.
Dataset Splits No The paper mentions the use of a 'validation set' and 'validation loss curves' (e.g., 'A practical solution is to use a validation set distinct from D to select an appropriate policy class and appropriate regularization criteria for RIS.', 'the network with a single layer of hidden units has 25% less MSE than the two hidden layer network). This last observation motivates our final experiment.', 'The top plot shows the training and validation loss curves.'). However, it does not provide specific details on the dataset splits (e.g., percentages, sample counts, or explicit methodology for creating the splits) needed for full reproducibility.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU/GPU models, memory) used for running the experiments. It only mentions 'neural network policies' and training with 'gradient descent' but no hardware specifications.
Software Dependencies No The paper mentions using 'Open AI gym', 'Roboschool versions', and 'neural network policies' trained with 'gradient descent', but it does not provide specific version numbers for any of the software libraries or dependencies used (e.g., PyTorch/TensorFlow version, OpenAI Gym version, Python version), which would be necessary for full reproducibility.
Experiment Setup No The paper provides some high-level details regarding the neural network architecture (e.g., 'neural network policies with 2 layers of 64 tanh hidden units') and mentions 'gradient descent' for training. However, it lacks specific critical experimental setup details such as learning rates, batch sizes, specific optimizers used, number of training epochs, or other concrete hyperparameter values and system-level configurations that would allow for full reproducibility of the experiments.