How to Combine Tree-Search Methods in Reinforcement Learning
Authors: Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor3494-3501
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The contribution of this work is primarily theoretical, but in Section 8 we also present experimental results on a toy domain. In this section, we empirically study NC-hm-PI (Section 5) and hm-PI (Section 6) in the exact and approximate cases. |
| Researcher Affiliation | Academia | Yonathan Efroni Technion, Israel Gal Dalal Technion, Israel Bruno Scherrer INRIA, Villers les Nancy, France Shie Mannor Technion, Israel |
| Pseudocode | Yes | Algorithm 1 h-PI, Algorithm 2 hm-PI, Algorithm 3 hλ-PI |
| Open Source Code | No | The paper does not provide any specific links or explicit statements about the availability of its own source code for the described methodology. |
| Open Datasets | No | The paper states 'We conducted our simulations on a simple N N deterministic grid-world problem with γ = 0.97, as was done in (Efroni et al. 2018a).' This is a custom experimental setup, and no access information (link, DOI, specific citation to a dataset resource) is provided for this 'toy domain' dataset. |
| Dataset Splits | No | The paper describes experiments in a grid-world reinforcement learning setup, but it does not specify any training, validation, or test dataset splits in terms of percentages or sample counts. |
| Hardware Specification | No | The paper describes the experimental setup in Section 8 but does not provide specific details about the hardware used, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper discusses various algorithms and theoretical concepts but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, or other libraries with their versions) that would be needed to reproduce the experiments. |
| Experiment Setup | Yes | We conducted our simulations on a simple N N deterministic grid-world problem with γ = 0.97, as was done in (Efroni et al. 2018a). The action set is { up , down , right , left , stay }. In each experiment, a reward rg = 1 was placed in a random state while in all other states the reward was drawn uniformly from [ 0.1, 0.1]. In the considered problem there is no terminal state. Also, the entries of the initial value function are drawn from N(0, 1). counted the total number of queries to the simulator until convergence, which defined as ||v vk|| 10 7. |