Model-based Reinforcement Learning for Parameterized Action Spaces
Authors: Renhao Zhang, Haotian Fu, Yilin Miao, George Konidaris
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results on several standard benchmarks show that our algorithm achieves superior sample efficiency and asymptotic performance than state-of-the-art PAMDP methods. Our empirical results on 8 different PAMDP benchmarks show that DLPA achieves better or comparable asymptotic performance with significantly better sample efficiency than all the state-of-the-art PAMDP algorithms. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Brown University. Correspondence to: Renhao Zhang <rzhan160@cs.brown.edu>, Haotian Fu <hfu7@cs.brown.edu>. |
| Pseudocode | Yes | Algorithm 1 DLPA |
| Open Source Code | Yes | 1Code available at https://github.com/Valarzz/ DLPA. |
| Open Datasets | Yes | We evaluated the performance of DLPA on eight standard PAMDP benchmarks, including Platform and Goal (Masson et al., 2016), Catch Point (Fan et al., 2019), Hard Goal and four versions of Hard Move. Note that these 8 benchmarks are exactly the same environments tested in Li et al. (2022) |
| Dataset Splits | No | The paper does not provide specific train/validation/test dataset splits. It describes sampling trajectories from a replay buffer for training in a reinforcement learning setup. |
| Hardware Specification | No | The paper states, "This work was conducted using computational resources and services at the Center for Computation and Visualization, Brown University," but does not specify any particular hardware like CPU/GPU models or memory. |
| Software Dependencies | No | The paper mentions general software components like MLPs and neural networks, but does not provide specific version numbers for any libraries, frameworks, or programming languages used (e.g., PyTorch 1.9, Python 3.8). |
| Experiment Setup | Yes | Table 4. DLPA hyperparameters. We list the most important hyperparameters during both training and evaluating. If there s only one value in the list, it means all environments use the same value, otherwise, it s in the order of Platform, Goal, Hard Goal, Catch Point, Hard Move (4), Hard Move (6), Hard Move (8), and Hard Move (10). |