Reinforcement Learning with Automated Auxiliary Loss Search
Authors: Tairan He, Yuge Zhang, Kan Ren, Minghuan Liu, Che Wang, Weinan Zhang, Yuqing Yang, Dongsheng Li
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that the discovered auxiliary loss (namely, A2-winner) significantly improves the performance on both high-dimensional (image) and lowdimensional (vector) unseen tasks with much higher efficiency, showing promising generalization ability to different settings and even different benchmark domains. We conduct a statistical analysis to reveal the relations between patterns of auxiliary losses and RL performance. |
| Researcher Affiliation | Collaboration | Tairan He1 Yuge Zhang2 Kan Ren2 Minghuan Liu1 Che Wang3 Weinan Zhang1 Yuqing Yang2 Dongsheng Li2 1Shanghai Jiao Tong University 2Microsoft Research Asia 3New York University |
| Pseudocode | Yes | The step-by-step evolution algorithm is provided in Algorithm 1 in the appendix |
| Open Source Code | Yes | The codes and supplementary materials are available at https://seqml.github.io/a2ls. |
| Open Datasets | Yes | training environments of continuous control from Deepmind Control suite (DMC) [43] |
| Dataset Splits | No | To ensure that we find a consistently-useful auxiliary loss, we conduct a cross validation. We first choose the top 5 candidates of stage-5 of the evolution on Cheetah-Run (detailed top candidates during the whole evolution procedure are provided in Appendix F). For each of the five candidates, we repeat the RL training on all three training environments, shown in Figure 5. |
| Hardware Specification | Yes | For each environment, we set the total budget to 16k GPU hours (on NVIDIA P100) and terminate the search when the resource is exhausted. |
| Software Dependencies | No | The paper mentions using "CURL [23] (see Appendix C.4.1 for details) to train the RL agents" and "Efficient Rainbow [44] as the base RL algorithm," but it does not specify software dependencies with version numbers (e.g., "Python 3.8", "PyTorch 1.9"). |
| Experiment Setup | Yes | We use the same network architecture and hyperparameters config as CURL [23] (see Appendix C.4.1 for details) to train the RL agents. |