reproducibilityindex.ai

Recon: Reducing Conflicting Gradients From the Root For Multi-Task Learning

Authors: Guangyuan SHI, Qimai Li, Wenlong Zhang, Jiaxin Chen, Xiao-Ming Wu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that such a simple approach can greatly reduce the occurrence of conflicting gradients in the remaining shared layers and achieve better performance, with only a slight increase in model parameters in many cases.In this section, we conduct extensive experiments to evaluate our approach Recon for multi-task learning and demonstrate its effectiveness, efficiency and generality.
Researcher Affiliation	Academia	Guangyuan Shi, Qimai Li, Wenlong Zhang, Jiaxin Chen, Xiao-Ming Wu Department of Computing, The Hong Kong Polytechnic University, Hong Kong S.A.R., China {guang-yuan.shi, qee-mai.li, wenlong.zhang}@connect.polyu.hk, jiax.chen@connect.polyu.hk, xiao-ming.wu@polyu.edu.hk
Pseudocode	Yes	Algorithm 1: Recon: Removing Layer-wise Conflicting Gradients
Open Source Code	Yes	The source code is available at https://github.com/moukamisama/Recon.
Open Datasets	Yes	We evaluate Recon on 4 multi-task datasets, namely Multi-Fashion+MNIST (Lin et al., 2019), City Scapes (Cordts et al., 2016), NYUv2 (Couprie et al., 2013), PASCAL-Context (Mottaghi et al., 2014), and Celeb A (Liu et al., 2015).
Dataset Splits	No	The paper mentions various datasets and training parameters, but it does not provide specific training/test/validation dataset splits (e.g., percentages or sample counts) for any of the datasets used, nor does it cite a source that defines these splits.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models (e.g., NVIDIA A100), CPU models, or memory amounts used for running the experiments. It only mentions the neural network architectures used.
Software Dependencies	No	The paper mentions various models and optimizers (e.g., Res Net18, SGD, Adam), but it does not provide specific version numbers for any software dependencies, programming languages, or libraries used to implement or run the experiments.
Experiment Setup	Yes	We train the model for 120 epochs with the batch size of 256. We adopt SGD with an initial learning rate of 0.1 and decay the learning rate by 0.1 at the 60th and 90th epoch.For CAGrad, we set α = 0.2.For Roto Grad, we set Rk = 100 which is equal to the dimension of shared features and set the learning rate of rotation parameters as learning rate of the neural networks.