Online Reinforcement Learning with Uncertain Episode Lengths
Authors: Debmalya Mandal, Goran Radanovic, Jiarui Gan, Adish Singla, Rupak Majumdar
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we compare our learning algorithms with existing value-iteration based episodic RL algorithms on a grid-world environment. Experiments We evaluated the performance of our algorithm on the Taxi environment, a 5 5 grid-world environment introduced by (Dietterich 2000). |
| Researcher Affiliation | Academia | 1Max Planck Institute for Software Systems 2 University of Oxford dmandal@mpi-sws.org, gradanovic@mpi-sws.org, jiarui.gan@cs.ox.ac.uk, adishs@mpi-sws.org, rupak@mpi-sws.org |
| Pseudocode | Yes | ALGORITHM 1: UCB-VI Generalized and ALGORITHM 2: Estimating Unknown Discount Factor |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We evaluated the performance of our algorithm on the Taxi environment, a 5 5 grid-world environment introduced by (Dietterich 2000). |
| Dataset Splits | No | The paper mentions evaluating performance on the Taxi environment for 100 episodes but does not specify training, validation, or test dataset splits. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers needed to replicate the experiment. |
| Experiment Setup | Yes | We considered 100 episodes and each episode length was generated uniformly at random from the following distributions. For the geometric discounting, we show γ = 0.9, 0.95 and 0.975. |