To the Max: Reinventing Reward in Reinforcement Learning
Authors: Grigorii Veviurko, Wendelin Boehmer, Mathijs De Weerdt
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium Robotics and demonstrate its benefits over standard RL. |
| Researcher Affiliation | Academia | 1Delft University of Technology. Correspondence to: Grigorii Veviurko <g.veviurko@tudelft.nl>. |
| Pseudocode | Yes | Algorithm 1 Max-reward TD3. Algorithm 2 Max-reward PPO. |
| Open Source Code | Yes | The code is available at https:// github.com/veviurko/To-the-Max. |
| Open Datasets | Yes | In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium Robotics... de Lazcano, R., Andreas, K., Tai, J. J., Lee, S. R., and Terry, J. Gymnasium robotics, 2023. URL http://github.com/Farama-Foundation/ Gymnasium-Robotics. |
| Dataset Splits | No | The paper does not provide specific details on dataset splits (e.g., percentages, sample counts) for training, validation, or testing. It discusses 'episodes' and 'environmental timesteps' in the context of learning, and 'success ratio' as a metric. |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments (e.g., specific GPU or CPU models, memory, or cloud computing resources). |
| Software Dependencies | No | The paper mentions using TD3 and PPO and refers to 'Gymnasium-Robotics', but it does not provide specific version numbers for any software components or libraries (e.g., Python, PyTorch, TensorFlow, Gymnasium). |
| Experiment Setup | Yes | Hyperparameters of all runs are reported in Tables 1-2. Table 1. Hyperparameters for the experiments with Maze environment. Table 2. Hyperparameters for the experiments with Fetch environment. |