reproducibilityindex.ai

Reinforcement Learning with Automated Auxiliary Loss Search

Authors: Tairan He, Yuge Zhang, Kan Ren, Minghuan Liu, Che Wang, Weinan Zhang, Yuqing Yang, Dongsheng Li

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that the discovered auxiliary loss (namely, A2-winner) signiﬁcantly improves the performance on both high-dimensional (image) and lowdimensional (vector) unseen tasks with much higher efﬁciency, showing promising generalization ability to different settings and even different benchmark domains. We conduct a statistical analysis to reveal the relations between patterns of auxiliary losses and RL performance.
Researcher Affiliation	Collaboration	Tairan He1 Yuge Zhang2 Kan Ren2 Minghuan Liu1 Che Wang3 Weinan Zhang1 Yuqing Yang2 Dongsheng Li2 1Shanghai Jiao Tong University 2Microsoft Research Asia 3New York University
Pseudocode	Yes	The step-by-step evolution algorithm is provided in Algorithm 1 in the appendix
Open Source Code	Yes	The codes and supplementary materials are available at https://seqml.github.io/a2ls.
Open Datasets	Yes	training environments of continuous control from Deepmind Control suite (DMC) [43]
Dataset Splits	No	To ensure that we ﬁnd a consistently-useful auxiliary loss, we conduct a cross validation. We ﬁrst choose the top 5 candidates of stage-5 of the evolution on Cheetah-Run (detailed top candidates during the whole evolution procedure are provided in Appendix F). For each of the ﬁve candidates, we repeat the RL training on all three training environments, shown in Figure 5.
Hardware Specification	Yes	For each environment, we set the total budget to 16k GPU hours (on NVIDIA P100) and terminate the search when the resource is exhausted.
Software Dependencies	No	The paper mentions using "CURL [23] (see Appendix C.4.1 for details) to train the RL agents" and "Efﬁcient Rainbow [44] as the base RL algorithm," but it does not specify software dependencies with version numbers (e.g., "Python 3.8", "PyTorch 1.9").
Experiment Setup	Yes	We use the same network architecture and hyperparameters conﬁg as CURL [23] (see Appendix C.4.1 for details) to train the RL agents.