reproducibilityindex.ai

Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes

Authors: Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we report numerical simulations supporting our theoretical ﬁndings and showing how TUCRL overcomes the limitations of the state-of-the-art. In this section, we present experiments to validate the theoretical ﬁndings of Sec. 3. We compare TUCRL against UCRL and SCAL. We ﬁrst consider the taxi problem [24] implemented in Open AI Gym [25].
Researcher Affiliation	Collaboration	Ronan Fruit Sequel Team Inria Lille ronan.fruit@inria.fr Matteo Pirotta Sequel Team Inria Lille matteo.pirotta@inria.fr Alessandro Lazaric Facebook AI Research lazaric@fb.com
Pseudocode	Yes	Figure 2: TUCRL algorithm.
Open Source Code	No	The paper mentions "The code is available on Git Hub" in the context of the taxi problem, which is an environment. It does not explicitly state that the source code for their proposed algorithm (TUCRL) is publicly available.
Open Datasets	Yes	We ﬁrst consider the taxi problem [24] implemented in Open AI Gym [25].
Dataset Splits	No	The paper does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, sample counts, or specific split methods).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions using "Open AI Gym" but does not specify version numbers for any software dependencies, libraries, or programming languages used in their implementation.
Experiment Setup	Yes	Conﬁdence intervals βr,k and βp,k are shrunk by a factor 0.05 and 0.01 for the three-states domain and taxi, respectively. [...] In practice, we set ρt = 49bt,δ t , so that the condition to remove transition reduces to N k (s, a) > p tk/SA.