reproducibilityindex.ai

Adaptive Auxiliary Task Weighting for Reinforcement Learning

Authors: Xingyu Lin, Harjatin Baweja, George Kantor, David Held

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show in various environments that our algorithm can effectively combine a variety of different auxiliary tasks and achieves signiﬁcant speedup compared to previous heuristic approaches of adapting auxiliary task weights. We ﬁrst answer some of the questions on a simple optimization problem. Then, we empirically evaluate different approaches on three Atari games and three goal-oriented reinforcement learning environments with visual observations, where the issue of sample complexity is exacerbated due to the high dimensional input.
Researcher Affiliation	Academia	Xingyu Lin Harjatin Singh Baweja George Kantor David Held Robotics Institute Carnegie Mellon University {xlin3, harjatis, kantor, dheld}@andrew.cmu.edu
Pseudocode	Yes	Algorithm 1 Learning with OL-AUX
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of the methodology described.
Open Datasets	Yes	evaluated on three Atari games [36]: Breakout, Pong and Sea Quest. evaluated on three visual robotic manipulation tasks simulated in Mu Jo Co [38]: Visual Fetch Reach (Open AI Gym [39]). Visual Hand Reach (Open AI Gym [39]). Visual Finger Turn (Deep Mind Control Suite [5]).
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits needed to reproduce the experiment.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using 'Adam as our optimizer' but does not list specific software dependencies with version numbers (e.g., library or framework versions like PyTorch 1.x or TensorFlow 2.x).
Experiment Setup	Yes	Input: Main task loss: Lmain K auxiliary task losses: Laux,1, . . . , Laux,K Horizon N Step size α, β Initialize θ0, w = 1, t = 0, for i = 0 to Training Epoch 1 do Collect new data using θt for j = 0 to Update Iteration 1 do t i Update Iteration + j Sample a mini-batch from dataset. For OL-AUX-1, we scale the learning rate β down by a factor of 5.