reproducibilityindex.ai

Online Reinforcement Learning with Uncertain Episode Lengths

Authors: Debmalya Mandal, Goran Radanovic, Jiarui Gan, Adish Singla, Rupak Majumdar

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we compare our learning algorithms with existing value-iteration based episodic RL algorithms on a grid-world environment. Experiments We evaluated the performance of our algorithm on the Taxi environment, a 5 5 grid-world environment introduced by (Dietterich 2000).
Researcher Affiliation	Academia	1Max Planck Institute for Software Systems 2 University of Oxford dmandal@mpi-sws.org, gradanovic@mpi-sws.org, jiarui.gan@cs.ox.ac.uk, adishs@mpi-sws.org, rupak@mpi-sws.org
Pseudocode	Yes	ALGORITHM 1: UCB-VI Generalized and ALGORITHM 2: Estimating Unknown Discount Factor
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	We evaluated the performance of our algorithm on the Taxi environment, a 5 5 grid-world environment introduced by (Dietterich 2000).
Dataset Splits	No	The paper mentions evaluating performance on the Taxi environment for 100 episodes but does not specify training, validation, or test dataset splits.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper.
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers needed to replicate the experiment.
Experiment Setup	Yes	We considered 100 episodes and each episode length was generated uniformly at random from the following distributions. For the geometric discounting, we show γ = 0.9, 0.95 and 0.975.