reproducibilityindex.ai

Model-based Reinforcement Learning for Parameterized Action Spaces

Authors: Renhao Zhang, Haotian Fu, Yilin Miao, George Konidaris

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results on several standard benchmarks show that our algorithm achieves superior sample efficiency and asymptotic performance than state-of-the-art PAMDP methods. Our empirical results on 8 different PAMDP benchmarks show that DLPA achieves better or comparable asymptotic performance with significantly better sample efficiency than all the state-of-the-art PAMDP algorithms.
Researcher Affiliation	Academia	1Department of Computer Science, Brown University. Correspondence to: Renhao Zhang <rzhan160@cs.brown.edu>, Haotian Fu <hfu7@cs.brown.edu>.
Pseudocode	Yes	Algorithm 1 DLPA
Open Source Code	Yes	1Code available at https://github.com/Valarzz/ DLPA.
Open Datasets	Yes	We evaluated the performance of DLPA on eight standard PAMDP benchmarks, including Platform and Goal (Masson et al., 2016), Catch Point (Fan et al., 2019), Hard Goal and four versions of Hard Move. Note that these 8 benchmarks are exactly the same environments tested in Li et al. (2022)
Dataset Splits	No	The paper does not provide specific train/validation/test dataset splits. It describes sampling trajectories from a replay buffer for training in a reinforcement learning setup.
Hardware Specification	No	The paper states, "This work was conducted using computational resources and services at the Center for Computation and Visualization, Brown University," but does not specify any particular hardware like CPU/GPU models or memory.
Software Dependencies	No	The paper mentions general software components like MLPs and neural networks, but does not provide specific version numbers for any libraries, frameworks, or programming languages used (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup	Yes	Table 4. DLPA hyperparameters. We list the most important hyperparameters during both training and evaluating. If there s only one value in the list, it means all environments use the same value, otherwise, it s in the order of Platform, Goal, Hard Goal, Catch Point, Hard Move (4), Hard Move (6), Hard Move (8), and Hard Move (10).