Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress
Authors: Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron C. Courville, Marc Bellemare
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate reincarnating RL s gains over tabula rasa RL on Atari 2600 games, a challenging locomotion task, and the real-world problem of navigating stratospheric balloons. |
| Researcher Affiliation | Collaboration | 1 Google Research, Brain Team 2 MILA |
| Pseudocode | No | The paper does not contain a pseudocode block or algorithm section. |
| Open Source Code | Yes | Open-sourced code and trained agents at agarwl.github.io/reincarnating_rl. |
| Open Datasets | Yes | We conduct experiments on ALE with sticky actions [57]. To reduce the computational cost of our experiments, we use a subset of 10 commonly-used Atari 2600 games: Asterix, Breakout, Space Invaders, Seaquest, Q Bert, Beam Rider, Enduro, Ms Pacman, Bowling and River Raid. |
| Dataset Splits | No | For the results in Section 4, we use 3 seeds per game on 10 games. |
| Hardware Specification | Yes | We obtain the teacher policy πT by running DQN [60] with Adam optimizer for 400 million environment frames, requiring 7 days of training per run with Dopamine [15] on P100 GPUs. |
| Software Dependencies | No | We use actor-critic agents in Acme [37]. |
| Experiment Setup | Yes | For the experiments in Section 4, we use learning rate of 1e-4, Adam optimizer, a batch size of 32, a discount factor of 0.99, a target update period of 2000, replay buffer size of 1M, and an epsilon schedule of 250k frames. |