reproducibilityindex.ai

Offline Evaluation of Online Reinforcement Learning Algorithms

Authors: Travis Mandel, Yun-En Liu, Emma Brunskill, Zoran Popović

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments, including those that use data from a real educational domain, show these methods have different tradeoffs.
Researcher Affiliation	Collaboration	1Center for Game Science, Computer Science & Engineering, University of Washington, Seattle, WA 2Enlearn TM, Seattle, WA 3School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
Pseudocode	Yes	Algorithm 1 Queue-based Evaluator; Algorithm 2 Per-State Rejection Sampling Evaluator; Algorithm 3 Per-Episode Rejection Sampling Evaluator
Open Source Code	Yes	For details see the appendix (available at http://grail.cs.washington.edu/projects/nonstationaryeval).
Open Datasets	Yes	We collected a dataset of 11,550 players collected from a child-focused educational website, collected using a semi-uniform sampling policy. [Also mentions] Six Arms (Strehl and Littman 2004).
Dataset Splits	No	The paper mentions using a dataset for evaluation but does not specify explicit training, validation, or test splits by percentages or sample counts for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies	No	The paper mentions using 'Posterior Sampling Reinforcement Learning (PSRL)' but does not provide specific version numbers for any software or libraries used in the experiments.
Experiment Setup	Yes	Here, we show results evaluating Posterior Sampling Reinforcement Learning (PSRL) ... The standard version of PSRL creates one deterministic policy each episode based on a single posterior sample; however, we can sample the posterior multiple times to create multiple policies and randomly choose between them at each step, which allows us to test our evaluators with more or less revealed randomness. ... PSRL run with 10 posterior samples.