Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes
Authors: Ronan Fruit, Matteo Pirotta, Alessandro Lazaric
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we report numerical simulations supporting our theoretical findings and showing how TUCRL overcomes the limitations of the state-of-the-art. In this section, we present experiments to validate the theoretical findings of Sec. 3. We compare TUCRL against UCRL and SCAL. We first consider the taxi problem [24] implemented in Open AI Gym [25]. |
| Researcher Affiliation | Collaboration | Ronan Fruit Sequel Team Inria Lille ronan.fruit@inria.fr Matteo Pirotta Sequel Team Inria Lille matteo.pirotta@inria.fr Alessandro Lazaric Facebook AI Research lazaric@fb.com |
| Pseudocode | Yes | Figure 2: TUCRL algorithm. |
| Open Source Code | No | The paper mentions "The code is available on Git Hub" in the context of the taxi problem, which is an environment. It does not explicitly state that the source code for their proposed algorithm (TUCRL) is publicly available. |
| Open Datasets | Yes | We first consider the taxi problem [24] implemented in Open AI Gym [25]. |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, sample counts, or specific split methods). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions using "Open AI Gym" but does not specify version numbers for any software dependencies, libraries, or programming languages used in their implementation. |
| Experiment Setup | Yes | Confidence intervals βr,k and βp,k are shrunk by a factor 0.05 and 0.01 for the three-states domain and taxi, respectively. [...] In practice, we set ρt = 49bt,δ t , so that the condition to remove transition reduces to N k (s, a) > p tk/SA. |