To the Max: Reinventing Reward in Reinforcement Learning

Authors: Grigorii Veviurko, Wendelin Boehmer, Mathijs De Weerdt

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium Robotics and demonstrate its benefits over standard RL.
Researcher Affiliation Academia 1Delft University of Technology. Correspondence to: Grigorii Veviurko <g.veviurko@tudelft.nl>.
Pseudocode Yes Algorithm 1 Max-reward TD3. Algorithm 2 Max-reward PPO.
Open Source Code Yes The code is available at https:// github.com/veviurko/To-the-Max.
Open Datasets Yes In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium Robotics... de Lazcano, R., Andreas, K., Tai, J. J., Lee, S. R., and Terry, J. Gymnasium robotics, 2023. URL http://github.com/Farama-Foundation/ Gymnasium-Robotics.
Dataset Splits No The paper does not provide specific details on dataset splits (e.g., percentages, sample counts) for training, validation, or testing. It discusses 'episodes' and 'environmental timesteps' in the context of learning, and 'success ratio' as a metric.
Hardware Specification No The paper does not specify the hardware used for running the experiments (e.g., specific GPU or CPU models, memory, or cloud computing resources).
Software Dependencies No The paper mentions using TD3 and PPO and refers to 'Gymnasium-Robotics', but it does not provide specific version numbers for any software components or libraries (e.g., Python, PyTorch, TensorFlow, Gymnasium).
Experiment Setup Yes Hyperparameters of all runs are reported in Tables 1-2. Table 1. Hyperparameters for the experiments with Maze environment. Table 2. Hyperparameters for the experiments with Fetch environment.