Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning
Authors: Aneesh Muppidi, Zhiyu Zhang, Heng Yang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on Procgen, Atari, and Gym Control environments show that TRAC works surprisingly well mitigating loss of plasticity and rapidly adapting to challenging distribution shifts despite the underlying optimization problem being nonconvex and nonstationary. |
| Researcher Affiliation | Academia | Aneesh Muppidi Harvard College aneeshmuppidi@college.harvard.edu Zhiyu Zhang Harvard University zhiyuz@seas.harvard.edu Heng Yang Harvard University hankyang@seas.harvard.edu |
| Pseudocode | Yes | Algorithm 1 TRAC: Parameter-free Adaption for Continual Environments. Algorithm 2 1D Discounted Tuner of TRAC. |
| Open Source Code | Yes | Project website and code is available here. |
| Open Datasets | Yes | Procgen We first evaluate on Open AI Procgen, a suite of 16 procedurally generated game environments (Cobbe et al., 2020). Atari The Arcade Learning Environment (ALE) Atari 2600 benchmark is a collection of classic arcade games designed to assess reinforcement learning agents performance across a range of diverse gaming scenarios (Bellemare et al., 2013). Gym Control We use the Cart Pole-v1 and Acrobot-v1 environments from the Gym Classic Control suite, along with Lunar Lander-v2 from Box2d Control. |
| Dataset Splits | No | The paper does not specify validation dataset splits. It mentions that the classical cross-validation approach would violate the one-shot nature of lifelong RL. |
| Hardware Specification | Yes | For the Procgen and Atari experiments, each was allocated a single A100 GPU, typically running for 3-4 days to complete. The Gym Control experiments were conducted using dual-core CPUs, generally concluding within a few hours. In both scenarios, an allocation of 8GB of RAM was sufficient to meet the computational demands. |
| Software Dependencies | No | The paper mentions using PyTorch and ADAM, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Table 4: PPO Parameters for Atari, Procgen, and Gym Control Experiments; For the Procgen and Atari experiments, the base ADAM optimizer was configured as the same as baseline, with a learning rate of 0.001, and for the Gym Control experiments, a learning rate of 0.01 was used. Other than the learning rate, we use the default ADAM parameters, including weight decay and betas, followed by the specifications outlined in the Py Torch Documentation. The setup for TRAC included β values for adaptive gradient adjustments: 0.9, 0.99, 0.999, 0.9999, 0.99999, and 0.999999. Both St and ε were initially set to (1 10 8). |