Adaptive Auxiliary Task Weighting for Reinforcement Learning
Authors: Xingyu Lin, Harjatin Baweja, George Kantor, David Held
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show in various environments that our algorithm can effectively combine a variety of different auxiliary tasks and achieves significant speedup compared to previous heuristic approaches of adapting auxiliary task weights. We first answer some of the questions on a simple optimization problem. Then, we empirically evaluate different approaches on three Atari games and three goal-oriented reinforcement learning environments with visual observations, where the issue of sample complexity is exacerbated due to the high dimensional input. |
| Researcher Affiliation | Academia | Xingyu Lin Harjatin Singh Baweja George Kantor David Held Robotics Institute Carnegie Mellon University {xlin3, harjatis, kantor, dheld}@andrew.cmu.edu |
| Pseudocode | Yes | Algorithm 1 Learning with OL-AUX |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of the methodology described. |
| Open Datasets | Yes | evaluated on three Atari games [36]: Breakout, Pong and Sea Quest. evaluated on three visual robotic manipulation tasks simulated in Mu Jo Co [38]: Visual Fetch Reach (Open AI Gym [39]). Visual Hand Reach (Open AI Gym [39]). Visual Finger Turn (Deep Mind Control Suite [5]). |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits needed to reproduce the experiment. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Adam as our optimizer' but does not list specific software dependencies with version numbers (e.g., library or framework versions like PyTorch 1.x or TensorFlow 2.x). |
| Experiment Setup | Yes | Input: Main task loss: Lmain K auxiliary task losses: Laux,1, . . . , Laux,K Horizon N Step size α, β Initialize θ0, w = 1, t = 0, for i = 0 to Training Epoch 1 do Collect new data using θt for j = 0 to Update Iteration 1 do t i Update Iteration + j Sample a mini-batch from dataset. For OL-AUX-1, we scale the learning rate β down by a factor of 5. |