Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes

Authors: Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we report numerical simulations supporting our theoretical findings and showing how TUCRL overcomes the limitations of the state-of-the-art. In this section, we present experiments to validate the theoretical findings of Sec. 3. We compare TUCRL against UCRL and SCAL. We first consider the taxi problem [24] implemented in Open AI Gym [25].
Researcher Affiliation Collaboration Ronan Fruit Sequel Team Inria Lille ronan.fruit@inria.fr Matteo Pirotta Sequel Team Inria Lille matteo.pirotta@inria.fr Alessandro Lazaric Facebook AI Research lazaric@fb.com
Pseudocode Yes Figure 2: TUCRL algorithm.
Open Source Code No The paper mentions "The code is available on Git Hub" in the context of the taxi problem, which is an environment. It does not explicitly state that the source code for their proposed algorithm (TUCRL) is publicly available.
Open Datasets Yes We first consider the taxi problem [24] implemented in Open AI Gym [25].
Dataset Splits No The paper does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, sample counts, or specific split methods).
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions using "Open AI Gym" but does not specify version numbers for any software dependencies, libraries, or programming languages used in their implementation.
Experiment Setup Yes Confidence intervals βr,k and βp,k are shrunk by a factor 0.05 and 0.01 for the three-states domain and taxi, respectively. [...] In practice, we set ρt = 49bt,δ t , so that the condition to remove transition reduces to N k (s, a) > p tk/SA.