Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Position: Lifetime tuning is incompatible with continual reinforcement learning

Authors: Golnaz Mesbahi, Parham Mohammad Panahi, Olya Mastikhina, Steven Tang, Martha White, Adam White

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical evidence to support our position by testing DQN and SAC across several continuing and non-stationary environments with two main findings: (1) lifetime tuning does not allow us to identify algorithms that work well for continual learning all algorithms equally succeed; (2) recently developed continual RL algorithms outperform standard noncontinual algorithms when tuning is limited to a fraction of the agent s lifetime.
Researcher Affiliation	Academia	1Department of Computing Science, University of Alberta, Edmonton, Canada 2Alberta Machine Intelligence Institute (Amii) 3Canada CIFAR AI Chair.
Pseudocode	No	The paper describes various algorithms and methodologies (DQN, SAC, W0Regularization, PT-DQN) but does not present any of them in a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We contrast using k-percent tuning and lifetime tuning to compare continual learning mitigation strategies for DQN in Jelly Bean World, a testbed for neverending, continual learning (Platanios et al., 2020).
Dataset Splits	No	The paper discusses tuning phases (e.g., "k-percent of its lifetime") and runs, but does not provide specific training/test/validation dataset splits with percentages, sample counts, or predefined split references.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies or library names with version numbers needed to replicate the experiments.
Experiment Setup	Yes	We consider a large set of hyperparameters for DQN, sweeping exploration (epsilon), batch size, buffer size, minimum buffer size, and the values of learning rate and β2 of the Adam optimizer. The ranges and chosen hyperparameters listed in Tables 1 and 2 respectively. ... Learning rate 10i : i [ 1, , 5], 0.08 Batch size 32, 256 Buffer size 1, 000, 10, 000, 100, 000 Min buffer size 0, 1000 Exploration ϵ 0.01, 0.1 Adam optimizer β2 0.9, 0.999 (Table 1)