Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Adaptive Rational Activations to Boost Deep Reinforcement Learning
Authors: Quentin Delfosse, Patrick Schramowski, Martin Mundt, Alejandro Molina, Kristian Kersting
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that equipping popular algorithms with (joint) rational activations leads to consistent improvements on different games from the Atari Learning Environment benchmark, notably making DQN competitive to DDQN and Rainbow. and We empirically demonstrate that rational activations bring significant improvements to DQN and Rainbow algorithms on Atari games and that our joint variant further increases performance. |
| Researcher Affiliation | Academia | 1 Computer Science Dept., TU Darmstadt 2 German Center for Artificial Intelligence 3 Hessian Center for Artifical Intelligence 4 Centre for Cognitive Science, Darmstadt |
| Pseudocode | No | The paper describes methodologies and provides mathematical derivations, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Rational library: github.com/k4ntz/activation-functions; Experiments: github.com/ml-research/rational_rl. |
| Open Datasets | Yes | 15 different games of the Atari 2600 domain (Brockman et al., 2017). and For the classification experiments, we run on CIFAR10 and CIFAR100 (Krizhevsky et al., MIT License) |
| Dataset Splits | No | The paper evaluates agents at steps, but does not explicitly describe a validation dataset split (e.g., percentages or counts for training, validation, and test sets). |
| Hardware Specification | Yes | 230.000 GPU hours, carried out on a DGX-2 Machine with Nvidia Tesla V100 with 32GB. and A single run took more than 40 days on an NVIDIA Tesla V100 GPU. |
| Software Dependencies | No | The paper mentions Mushroom RL library, Arcade Learning Environment, CUDA optimized implementation, and Adam optimiser, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | The target network is updated every 10K steps, with a replay buffer memory of initial size 50K, and maximum size 500K, except for Pong, for which all these values are divided by 10. The discount factor γ is set to 0.99 and the learning rate is 0.00025. We do not select the best policy among seeds between epochs. We use the simple ϵ-greedy exploration policy, with the ϵ decreasing linearly from 1 to 0.1 over 1M steps, and an ϵ of 0.05 is used for testing. |