Offline Evaluation of Online Reinforcement Learning Algorithms

Authors: Travis Mandel, Yun-En Liu, Emma Brunskill, Zoran Popović

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments, including those that use data from a real educational domain, show these methods have different tradeoffs.
Researcher Affiliation Collaboration 1Center for Game Science, Computer Science & Engineering, University of Washington, Seattle, WA 2Enlearn TM, Seattle, WA 3School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
Pseudocode Yes Algorithm 1 Queue-based Evaluator; Algorithm 2 Per-State Rejection Sampling Evaluator; Algorithm 3 Per-Episode Rejection Sampling Evaluator
Open Source Code Yes For details see the appendix (available at http://grail.cs.washington.edu/projects/nonstationaryeval).
Open Datasets Yes We collected a dataset of 11,550 players collected from a child-focused educational website, collected using a semi-uniform sampling policy. [Also mentions] Six Arms (Strehl and Littman 2004).
Dataset Splits No The paper mentions using a dataset for evaluation but does not specify explicit training, validation, or test splits by percentages or sample counts for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies No The paper mentions using 'Posterior Sampling Reinforcement Learning (PSRL)' but does not provide specific version numbers for any software or libraries used in the experiments.
Experiment Setup Yes Here, we show results evaluating Posterior Sampling Reinforcement Learning (PSRL) ... The standard version of PSRL creates one deterministic policy each episode based on a single posterior sample; however, we can sample the posterior multiple times to create multiple policies and randomly choose between them at each step, which allows us to test our evaluators with more or less revealed randomness. ... PSRL run with 10 posterior samples.