Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Neuroplastic Expansion in Deep Reinforcement Learning
Authors: Jiashun Liu, Johan S Obando Ceron, Aaron Courville, Ling Pan
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that NE effectively mitigates plasticity loss and outperforms state-of-the-art methods across various tasks in Mu Jo Co and Deep Mind Control Suite environments. |
| Researcher Affiliation | Academia | Jiashun Liu HKUST Johan Obando-Ceron Mila Qu ebec AI Institute Universit e de Montr eal Aaron Courville Mila Qu ebec AI Institute Universit e de Montr eal Ling Pan HKUST Corresponding author, email: EMAIL |
| Pseudocode | Yes | D PSEUDOCODE CODE D.1 PSEUDO-CODE FOR NE Algorithm 1 Neuroplastic Expansion TD3 (...) D.2 PSEUDO-CODE FOR TRUNCATE PROCESS Algorithm 2 Truncate Process |
| Open Source Code | Yes | We make our code publicly available. |
| Open Datasets | Yes | Extensive experiments demonstrate that NE effectively mitigates plasticity loss and outperforms state-of-the-art methods across various tasks in Mu Jo Co and Deep Mind Control Suite environments. (...) We conduct a series of experiments based on the standard continuous control tasks from Open AI Gym (Brockman, 2016) simulated by Mu Jo Co (Todorov et al., 2012) with long-term training setting, i.e. 3M steps 6M. |
| Dataset Splits | No | The paper describes training in various environments for a certain number of steps (e.g., "3M steps 6M") and samples from a replay buffer, but it does not provide explicit training/test/validation dataset splits in the conventional sense for supervised learning. |
| Hardware Specification | Yes | Our codes are implemented with Python 3.8 and Torch 1.12.1. All experiments were run on NVIDIA Ge Force GTX 3090 GPUs. |
| Software Dependencies | Yes | Our codes are implemented with Python 3.8 and Torch 1.12.1. |
| Experiment Setup | Yes | The hyper-parameters for TD3 are presented in Table 2. (...) For Humanoid and Ant tasks, we set grow interval T = 25000, grow number k = 0.01 rest capacity, Prune upper bond ω = 0.4, ending step is the max training step, the threshold of ER is 0.35 and the decay weight α = 0.02(which is used in all the tasks). For other Open AI Mujoco tasks, we set grow interval T = 20000, grow number k = 0.15 rest capacity, Prune upper bond ω = 0.2, ending step is the max training step, the threshold of ER is 0.25. |