Recon: Reducing Conflicting Gradients From the Root For Multi-Task Learning
Authors: Guangyuan SHI, Qimai Li, Wenlong Zhang, Jiaxin Chen, Xiao-Ming Wu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that such a simple approach can greatly reduce the occurrence of conflicting gradients in the remaining shared layers and achieve better performance, with only a slight increase in model parameters in many cases.In this section, we conduct extensive experiments to evaluate our approach Recon for multi-task learning and demonstrate its effectiveness, efficiency and generality. |
| Researcher Affiliation | Academia | Guangyuan Shi, Qimai Li, Wenlong Zhang, Jiaxin Chen, Xiao-Ming Wu Department of Computing, The Hong Kong Polytechnic University, Hong Kong S.A.R., China {guang-yuan.shi, qee-mai.li, wenlong.zhang}@connect.polyu.hk, jiax.chen@connect.polyu.hk, xiao-ming.wu@polyu.edu.hk |
| Pseudocode | Yes | Algorithm 1: Recon: Removing Layer-wise Conflicting Gradients |
| Open Source Code | Yes | The source code is available at https://github.com/moukamisama/Recon. |
| Open Datasets | Yes | We evaluate Recon on 4 multi-task datasets, namely Multi-Fashion+MNIST (Lin et al., 2019), City Scapes (Cordts et al., 2016), NYUv2 (Couprie et al., 2013), PASCAL-Context (Mottaghi et al., 2014), and Celeb A (Liu et al., 2015). |
| Dataset Splits | No | The paper mentions various datasets and training parameters, but it does not provide specific training/test/validation dataset splits (e.g., percentages or sample counts) for any of the datasets used, nor does it cite a source that defines these splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models (e.g., NVIDIA A100), CPU models, or memory amounts used for running the experiments. It only mentions the neural network architectures used. |
| Software Dependencies | No | The paper mentions various models and optimizers (e.g., Res Net18, SGD, Adam), but it does not provide specific version numbers for any software dependencies, programming languages, or libraries used to implement or run the experiments. |
| Experiment Setup | Yes | We train the model for 120 epochs with the batch size of 256. We adopt SGD with an initial learning rate of 0.1 and decay the learning rate by 0.1 at the 60th and 90th epoch.For CAGrad, we set α = 0.2.For Roto Grad, we set Rk = 100 which is equal to the dimension of shared features and set the learning rate of rotation parameters as learning rate of the neural networks. |