Reinforcement Learning with Automated Auxiliary Loss Search

Authors: Tairan He, Yuge Zhang, Kan Ren, Minghuan Liu, Che Wang, Weinan Zhang, Yuqing Yang, Dongsheng Li

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that the discovered auxiliary loss (namely, A2-winner) significantly improves the performance on both high-dimensional (image) and lowdimensional (vector) unseen tasks with much higher efficiency, showing promising generalization ability to different settings and even different benchmark domains. We conduct a statistical analysis to reveal the relations between patterns of auxiliary losses and RL performance.
Researcher Affiliation Collaboration Tairan He1 Yuge Zhang2 Kan Ren2 Minghuan Liu1 Che Wang3 Weinan Zhang1 Yuqing Yang2 Dongsheng Li2 1Shanghai Jiao Tong University 2Microsoft Research Asia 3New York University
Pseudocode Yes The step-by-step evolution algorithm is provided in Algorithm 1 in the appendix
Open Source Code Yes The codes and supplementary materials are available at https://seqml.github.io/a2ls.
Open Datasets Yes training environments of continuous control from Deepmind Control suite (DMC) [43]
Dataset Splits No To ensure that we find a consistently-useful auxiliary loss, we conduct a cross validation. We first choose the top 5 candidates of stage-5 of the evolution on Cheetah-Run (detailed top candidates during the whole evolution procedure are provided in Appendix F). For each of the five candidates, we repeat the RL training on all three training environments, shown in Figure 5.
Hardware Specification Yes For each environment, we set the total budget to 16k GPU hours (on NVIDIA P100) and terminate the search when the resource is exhausted.
Software Dependencies No The paper mentions using "CURL [23] (see Appendix C.4.1 for details) to train the RL agents" and "Efficient Rainbow [44] as the base RL algorithm," but it does not specify software dependencies with version numbers (e.g., "Python 3.8", "PyTorch 1.9").
Experiment Setup Yes We use the same network architecture and hyperparameters config as CURL [23] (see Appendix C.4.1 for details) to train the RL agents.