Adaptive Auxiliary Task Weighting for Reinforcement Learning

Authors: Xingyu Lin, Harjatin Baweja, George Kantor, David Held

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show in various environments that our algorithm can effectively combine a variety of different auxiliary tasks and achieves significant speedup compared to previous heuristic approaches of adapting auxiliary task weights. We first answer some of the questions on a simple optimization problem. Then, we empirically evaluate different approaches on three Atari games and three goal-oriented reinforcement learning environments with visual observations, where the issue of sample complexity is exacerbated due to the high dimensional input.
Researcher Affiliation Academia Xingyu Lin Harjatin Singh Baweja George Kantor David Held Robotics Institute Carnegie Mellon University {xlin3, harjatis, kantor, dheld}@andrew.cmu.edu
Pseudocode Yes Algorithm 1 Learning with OL-AUX
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the methodology described.
Open Datasets Yes evaluated on three Atari games [36]: Breakout, Pong and Sea Quest. evaluated on three visual robotic manipulation tasks simulated in Mu Jo Co [38]: Visual Fetch Reach (Open AI Gym [39]). Visual Hand Reach (Open AI Gym [39]). Visual Finger Turn (Deep Mind Control Suite [5]).
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits needed to reproduce the experiment.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using 'Adam as our optimizer' but does not list specific software dependencies with version numbers (e.g., library or framework versions like PyTorch 1.x or TensorFlow 2.x).
Experiment Setup Yes Input: Main task loss: Lmain K auxiliary task losses: Laux,1, . . . , Laux,K Horizon N Step size α, β Initialize θ0, w = 1, t = 0, for i = 0 to Training Epoch 1 do Collect new data using θt for j = 0 to Update Iteration 1 do t i Update Iteration + j Sample a mini-batch from dataset. For OL-AUX-1, we scale the learning rate β down by a factor of 5.