reproducibilityindex.ai

How to Combine Tree-Search Methods in Reinforcement Learning

Authors: Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor3494-3501

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The contribution of this work is primarily theoretical, but in Section 8 we also present experimental results on a toy domain. In this section, we empirically study NC-hm-PI (Section 5) and hm-PI (Section 6) in the exact and approximate cases.
Researcher Affiliation	Academia	Yonathan Efroni Technion, Israel Gal Dalal Technion, Israel Bruno Scherrer INRIA, Villers les Nancy, France Shie Mannor Technion, Israel
Pseudocode	Yes	Algorithm 1 h-PI, Algorithm 2 hm-PI, Algorithm 3 hλ-PI
Open Source Code	No	The paper does not provide any specific links or explicit statements about the availability of its own source code for the described methodology.
Open Datasets	No	The paper states 'We conducted our simulations on a simple N N deterministic grid-world problem with γ = 0.97, as was done in (Efroni et al. 2018a).' This is a custom experimental setup, and no access information (link, DOI, specific citation to a dataset resource) is provided for this 'toy domain' dataset.
Dataset Splits	No	The paper describes experiments in a grid-world reinforcement learning setup, but it does not specify any training, validation, or test dataset splits in terms of percentages or sample counts.
Hardware Specification	No	The paper describes the experimental setup in Section 8 but does not provide specific details about the hardware used, such as CPU or GPU models, or memory specifications.
Software Dependencies	No	The paper discusses various algorithms and theoretical concepts but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, or other libraries with their versions) that would be needed to reproduce the experiments.
Experiment Setup	Yes	We conducted our simulations on a simple N N deterministic grid-world problem with γ = 0.97, as was done in (Efroni et al. 2018a). The action set is { up , down , right , left , stay }. In each experiment, a reward rg = 1 was placed in a random state while in all other states the reward was drawn uniformly from [ 0.1, 0.1]. In the considered problem there is no terminal state. Also, the entries of the initial value function are drawn from N(0, 1). counted the total number of queries to the simulator until convergence, which deﬁned as \|\|v vk\|\| 10 7.