Adaptive Rational Activations to Boost Deep Reinforcement Learning

Authors: Quentin Delfosse, Patrick Schramowski, Martin Mundt, Alejandro Molina, Kristian Kersting

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that equipping popular algorithms with (joint) rational activations leads to consistent improvements on different games from the Atari Learning Environment benchmark, notably making DQN competitive to DDQN and Rainbow. and We empirically demonstrate that rational activations bring significant improvements to DQN and Rainbow algorithms on Atari games and that our joint variant further increases performance.
Researcher Affiliation Academia 1 Computer Science Dept., TU Darmstadt 2 German Center for Artificial Intelligence 3 Hessian Center for Artifical Intelligence 4 Centre for Cognitive Science, Darmstadt
Pseudocode No The paper describes methodologies and provides mathematical derivations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Rational library: github.com/k4ntz/activation-functions; Experiments: github.com/ml-research/rational_rl.
Open Datasets Yes 15 different games of the Atari 2600 domain (Brockman et al., 2017). and For the classification experiments, we run on CIFAR10 and CIFAR100 (Krizhevsky et al., MIT License)
Dataset Splits No The paper evaluates agents at steps, but does not explicitly describe a validation dataset split (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification Yes 230.000 GPU hours, carried out on a DGX-2 Machine with Nvidia Tesla V100 with 32GB. and A single run took more than 40 days on an NVIDIA Tesla V100 GPU.
Software Dependencies No The paper mentions Mushroom RL library, Arcade Learning Environment, CUDA optimized implementation, and Adam optimiser, but does not provide specific version numbers for these software components.
Experiment Setup Yes The target network is updated every 10K steps, with a replay buffer memory of initial size 50K, and maximum size 500K, except for Pong, for which all these values are divided by 10. The discount factor γ is set to 0.99 and the learning rate is 0.00025. We do not select the best policy among seeds between epochs. We use the simple ϵ-greedy exploration policy, with the ϵ decreasing linearly from 1 to 0.1 over 1M steps, and an ϵ of 0.05 is used for testing.