ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning

Authors: Junguang Jiang, Baixu Chen, Junwei Pan, Ximei Wang, Dapeng Liu, Jie Jiang, Mingsheng Long

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a series of auxiliary-task learning benchmarks, Fork Merge outperforms existing methods and effectively mitigates negative transfer. (3) We conduct extensive experiments and validate that Fork Merge outperforms previous methods on a series of ATL benchmarks.
Researcher Affiliation Collaboration Junguang Jiang , Baixu Chen , Junwei Pan , Ximei Wang , Dapeng Liu , Jie Jiang , Mingsheng Long B School of Software, BNRist, Tsinghua University, China Tencent Inc, China {jjg20,cbx22}@mails.tsinghua.edu.cn, {jonaspan,messixmwang,rocliu,zeus}@tencent.com, mingsheng@tsinghua.edu.cn
Pseudocode Yes Algorithm 1 Fork Merge Training Pipeline. Algorithm 2 Fork Merge Training Pipeline with Multiple Branches Algorithm 3 Greedy Search of Λ
Open Source Code Yes The codebase for both our method and the compared methods will be available at https://github.com/thuml/Fork Merge.
Open Datasets Yes We conduct our analysis on a multi-domain image recognition dataset Domain Net [61] with Res Net-18 [23] pre-trained on Image Net. Specifically, we use task Painting and Quickdraw in Domain Net as target tasks respectively to showcase weak negative transfer and strong negative transfer, and mix all other tasks in Domain Net as auxiliary tasks. We will elaborate on the Domain Net dataset in Appendix C.3 and provide the detailed experiment design in Appendix B. We evaluate on the widely-used multi-task scene understanding dataset, NYUv2 [68]... We evaluate on Ali Express dataset [36]... We also evaluate on two SSL datasets, CIFAR-10 [31] and SVHN [56].
Dataset Splits Yes Following [55], we use 636, 159 and 654 images for training, validation, and test. As the original Domain Net does not provide a separate validation set, we randomly split 50% data from the test set as the validation set. Then, we randomly sample labeled images from the training set. Table 10 summarizes the statistics of CIFAR-10 and SVHN.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions software like Lib MTL [38], MTAN [44], TLlib [26], MTReclib [85], Adam optimizer [29], and Res Net-50/18/101. However, it does not specify version numbers for these software components or libraries, which are crucial for reproducibility.
Experiment Setup Yes Each method is trained for 200 epochs with the Adam optimizer [29] and batch size of 8. The initial learning rate is 10 4 and halved to 5 10 5 after 100 epochs. In Fork Merge, the parameters are merged every 10 epochs. Each method is trained for 50K iterations. In Fork Merge, the parameters are merged every 12.5K iterations. Each method is trained for 50 epochs using the Adam optimizer, with the batch size of 2048, learning rate of 10 3 and weight decay of 10 6. We adopt Adam [29] optimizer with an initial learning rate of 0.005. We train each method for 200K iterations and decay the learning rate by a factor of 0.2 at 160K iterations.