Temporal-Difference Learning Using Distributed Error Signals

Authors: Jonas Guan, Shon Verch, Claas Voelcker, Ethan Jackson, Nicolas Papernot, William Cunningham

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We design a new deep Q-learning algorithm, ARTIFICIAL DOPAMINE, to computationally demonstrate that synchronously distributed, per-layer TD errors may be sufficient to learn surprisingly complex RL tasks. We empirically evaluate our algorithm on Min Atar, the Deep Mind Control Suite, and classic control tasks, and show it often achieves comparable performance to deep RL algorithms that use backpropagation.
Researcher Affiliation Academia Jonas Guan1,2, Shon Eduard Verch1, Claas Voelcker1,2, Ethan C. Jackson1, Nicolas Papernot1,2, William A. Cunningham1,2,3 1University of Toronto 2Vector Institute 3Schwartz Reisman Institute for Technology and Society
Pseudocode Yes Algorithm 1 AD Q-Learning
Open Source Code Yes Our code is available at https://github.com/social-ai-uoft/ad-paper.
Open Datasets Yes Min Atar [70] is an simplified implementation of 5 Atari 2600 games: Seaquest, Breakout, Asterix, Freeway, and Space Invaders. The Deep Mind Control (DMC) Suite [64] is a set of low-level robotics control environments, with continuous state spaces and tasks of varying difficulty. For our experiments, we used a discretized action space, following Seyde et al. [61] [...] In addition, we provide results on the classic control tasks Cart Pole, Mountain Car, Lunar Lander, and Acrobot, which we include in Appendix C. For a more elaborate discussion on these environments and our task choice, see Appendix I.
Dataset Splits No The paper does not specify traditional train/validation/test dataset splits as commonly found in supervised learning, as the data for RL is generated through interaction with environments. It discusses a training process and evaluation on test environments, but no distinct validation set split is described.
Hardware Specification Yes On an Nvidia RTX 2080 GPU, a full training run of AD takes approximately 5 hours on the Min Atar environments, and 3 hours on the DMC environments. On an Nvidia A100 GPU, the run takes approximately 3.5 hours on Min Atar, and 2.5 hours on DMC.
Software Dependencies Yes We implement our algorithm in Jax [12]. ... Asset Version License URL Jax [12] 0.4.11 Apache 2.0 https://github.com/google/jax/ Min Atar [70] 1 GPL-3.0 https://github.com/kenjyoung/Min Atar DMC [64] 1.0.16 Apache 2.0 https://github.com/google-deepmind/dm_control/ Gymnasium [65] 0.28.1 MIT https://github.com/Farama-Foundation/Gymnasium/
Experiment Setup Yes Network architecture and hyperparameters. On Min Atar, we use a 3-hidden-layer network with forward activation connections. The cell output sizes are 400, 200, and 200. ... On DMC, we use a smaller network with cell output sizes 128, 96, and 96, and discretize the action space following Seyde et al. [61]. For more details, see Appendix G. For each benchmark, we use the same network and hyperparameters across all tasks to test the robustness of our architecture and learning algorithm. ... We show the hyperparameters of the AD network and DQN used in our Min Atar experiments. ... We show the hyperparameters of the AD network in our DMC experiments.