Conflict-Averse Gradient Descent for Multi-task learning

Authors: Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, Qiang Liu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a series of challenging multi-task supervised learning and reinforcement learning tasks, CAGrad achieves improved performance over prior state-of-the-art multi-objective gradient manipulation methods. Code is available at https://github.com/Cranial-XIX/CAGrad.
Researcher Affiliation Collaboration Bo Liu, Xingchao Liu, Xiaojie Jin, , Peter Stone, Qiang Liu The University of Texas at Austin, Sony AI, Bytedance Research {bliu,xcliu,pstone,lqiang}@cs.utexas.edu, xjjin0731@gmail.com
Pseudocode Yes Algorithm 1 Conflict-averse Gradient Descent (CAGrad) for Multi-task Learning
Open Source Code Yes Code is available at https://github.com/Cranial-XIX/CAGrad.
Open Datasets Yes To answer questions (1) and (2), we create a toy optimization example to evaluate the convergence of CAGrad compared to MGDA and PCGrad. On the same toy example, we ablate over the constant c and show that CAGrad recovers GD and MGDA with proper c values. Next, to test CAGrad on more complicated neural models, we perform the same set of experiments on the Multi-Fashion+MNIST benchmark [19] with a shrinked Le Net architecture [18] (in which each layer has a reduced number of neurons compared to the original Le Net). Please refer to Appendix B for more details.
Dataset Splits Yes 10% of the training images is held out as the validation set.
Hardware Specification Yes All experiments are run on a single NVIDIA V100 GPU.
Software Dependencies No The paper mentions software like Adam optimizer and Soft Actor-Critic (SAC) but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We consider a shrinked Le Net as our model, and train it with Adam [16] optimizer with a 0.001 learning rate for 50 epochs using a batch size of 256.