reproducibilityindex.ai

Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning

Authors: Aneesh Muppidi, Zhiyu Zhang, Heng Yang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on Procgen, Atari, and Gym Control environments show that TRAC works surprisingly well mitigating loss of plasticity and rapidly adapting to challenging distribution shifts despite the underlying optimization problem being nonconvex and nonstationary.
Researcher Affiliation	Academia	Aneesh Muppidi Harvard College aneeshmuppidi@college.harvard.edu Zhiyu Zhang Harvard University zhiyuz@seas.harvard.edu Heng Yang Harvard University hankyang@seas.harvard.edu
Pseudocode	Yes	Algorithm 1 TRAC: Parameter-free Adaption for Continual Environments. Algorithm 2 1D Discounted Tuner of TRAC.
Open Source Code	Yes	Project website and code is available here.
Open Datasets	Yes	Procgen We first evaluate on Open AI Procgen, a suite of 16 procedurally generated game environments (Cobbe et al., 2020). Atari The Arcade Learning Environment (ALE) Atari 2600 benchmark is a collection of classic arcade games designed to assess reinforcement learning agents performance across a range of diverse gaming scenarios (Bellemare et al., 2013). Gym Control We use the Cart Pole-v1 and Acrobot-v1 environments from the Gym Classic Control suite, along with Lunar Lander-v2 from Box2d Control.
Dataset Splits	No	The paper does not specify validation dataset splits. It mentions that the classical cross-validation approach would violate the one-shot nature of lifelong RL.
Hardware Specification	Yes	For the Procgen and Atari experiments, each was allocated a single A100 GPU, typically running for 3-4 days to complete. The Gym Control experiments were conducted using dual-core CPUs, generally concluding within a few hours. In both scenarios, an allocation of 8GB of RAM was sufficient to meet the computational demands.
Software Dependencies	No	The paper mentions using PyTorch and ADAM, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Table 4: PPO Parameters for Atari, Procgen, and Gym Control Experiments; For the Procgen and Atari experiments, the base ADAM optimizer was configured as the same as baseline, with a learning rate of 0.001, and for the Gym Control experiments, a learning rate of 0.01 was used. Other than the learning rate, we use the default ADAM parameters, including weight decay and betas, followed by the specifications outlined in the Py Torch Documentation. The setup for TRAC included β values for adaptive gradient adjustments: 0.9, 0.99, 0.999, 0.9999, 0.99999, and 0.999999. Both St and ε were initially set to (1 10 8).