Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Adaptive Auxiliary Task Weighting for Reinforcement Learning
Authors: Xingyu Lin, Harjatin Baweja, George Kantor, David Held
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show in various environments that our algorithm can effectively combine a variety of different auxiliary tasks and achieves significant speedup compared to previous heuristic approaches of adapting auxiliary task weights. We first answer some of the questions on a simple optimization problem. Then, we empirically evaluate different approaches on three Atari games and three goal-oriented reinforcement learning environments with visual observations, where the issue of sample complexity is exacerbated due to the high dimensional input. |
| Researcher Affiliation | Academia | Xingyu Lin Harjatin Singh Baweja George Kantor David Held Robotics Institute Carnegie Mellon University EMAIL |
| Pseudocode | Yes | Algorithm 1 Learning with OL-AUX |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of the methodology described. |
| Open Datasets | Yes | evaluated on three Atari games [36]: Breakout, Pong and Sea Quest. evaluated on three visual robotic manipulation tasks simulated in Mu Jo Co [38]: Visual Fetch Reach (Open AI Gym [39]). Visual Hand Reach (Open AI Gym [39]). Visual Finger Turn (Deep Mind Control Suite [5]). |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits needed to reproduce the experiment. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Adam as our optimizer' but does not list specific software dependencies with version numbers (e.g., library or framework versions like PyTorch 1.x or TensorFlow 2.x). |
| Experiment Setup | Yes | Input: Main task loss: Lmain K auxiliary task losses: Laux,1, . . . , Laux,K Horizon N Step size α, β Initialize θ0, w = 1, t = 0, for i = 0 to Training Epoch 1 do Collect new data using θt for j = 0 to Update Iteration 1 do t i Update Iteration + j Sample a mini-batch from dataset. For OL-AUX-1, we scale the learning rate β down by a factor of 5. |