IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks

Authors: Michael Luo, Jiahao Yao, Richard Liaw, Eric Liang, Ion Stoica

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In discrete action-space environments, we show that IMPACT attains higher reward and, simultaneously, achieves up to 30% decrease in training wall-time than that of IMPALA. For continuous control environments, IMPACT trains faster than existing scalable agents while preserving the sample efficiency of synchronous PPO.
Researcher Affiliation Academia Michael Luo UC Berkeley michael.luo@berkeley.edu Jiahao Yao UC Berkeley jiahaoyao@berkeley.edu Richard Liaw UC Berkeley Eric Liang UC Berkeley Ion Stoica UC Berkeley
Pseudocode Yes Algorithm 1 IMPACT
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We tested the agent on three continuous environments (Figure 5): Half Cheetah, Hopper, and Humanoid on 16 CPUs and 1 GPU. For the discrete environments (Figure 6), Pong, Space Invaders, and Breakout were chosen as common benchmarks used in popular distributed RL libraries (Caspi et al., 2017; Liang et al., 2018). Additional experiments for discrete environments are in the Appendix. These experiments were ran on 32 CPUs and 1 GPU.
Dataset Splits No The paper mentions 'evaluation rollouts' but does not explicitly provide specific details on how the dataset was split into training, validation, and test portions (e.g., percentages, sample counts, or citations to predefined splits).
Hardware Specification No We tested the agent on three continuous environments... on 16 CPUs and 1 GPU. For the discrete environments... ran on 32 CPUs and 1 GPU. This only specifies the count of CPUs and GPUs, not their specific models (e.g., 'NVIDIA A100', 'Intel Xeon'), which is required for detailed hardware specification.
Software Dependencies No The paper mentions environments like 'OpenAI Gym', 'MuJoCo', and 'Atari environments', and implicitly uses deep learning frameworks. However, it does not specify any software components with their version numbers (e.g., 'Python 3.x', 'PyTorch 1.x').
Experiment Setup Yes The hyper-parameters for continuous and discrete environments are listed in the Appendix B table 1 and 2 respectively. Table 1: Hyperparameters for Discrete Environments. Table 2: Hyperparameters for Continuous Control Environments.