reproducibilityindex.ai

To the Max: Reinventing Reward in Reinforcement Learning

Authors: Grigorii Veviurko, Wendelin Boehmer, Mathijs De Weerdt

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium Robotics and demonstrate its benefits over standard RL.
Researcher Affiliation	Academia	1Delft University of Technology. Correspondence to: Grigorii Veviurko <g.veviurko@tudelft.nl>.
Pseudocode	Yes	Algorithm 1 Max-reward TD3. Algorithm 2 Max-reward PPO.
Open Source Code	Yes	The code is available at https:// github.com/veviurko/To-the-Max.
Open Datasets	Yes	In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium Robotics... de Lazcano, R., Andreas, K., Tai, J. J., Lee, S. R., and Terry, J. Gymnasium robotics, 2023. URL http://github.com/Farama-Foundation/ Gymnasium-Robotics.
Dataset Splits	No	The paper does not provide specific details on dataset splits (e.g., percentages, sample counts) for training, validation, or testing. It discusses 'episodes' and 'environmental timesteps' in the context of learning, and 'success ratio' as a metric.
Hardware Specification	No	The paper does not specify the hardware used for running the experiments (e.g., specific GPU or CPU models, memory, or cloud computing resources).
Software Dependencies	No	The paper mentions using TD3 and PPO and refers to 'Gymnasium-Robotics', but it does not provide specific version numbers for any software components or libraries (e.g., Python, PyTorch, TensorFlow, Gymnasium).
Experiment Setup	Yes	Hyperparameters of all runs are reported in Tables 1-2. Table 1. Hyperparameters for the experiments with Maze environment. Table 2. Hyperparameters for the experiments with Fetch environment.