Efficient and Effective Multi-task Grouping via Meta Learning on Task Combinations

Authors: Xiaozhuang Song, Shun Zheng, Wei Cao, James Yu, Jiang Bian

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on diversified multi-task scenarios demonstrate the efficiency and effectiveness of our method.
Researcher Affiliation Collaboration Xiaozhuang Song Southern University of Science and Technology shawnsxz97@gmail.com Shun Zheng Microsoft Research shun.zheng@microsoft.com Wei Cao Microsoft Research wei.cao@microsoft.com James J.Q. Yu Southern University of Science and Technology yujq3@sustech.edu.cn Jiang Bian Microsoft Research jiang.bian@microsoft.com
Pseudocode Yes Algorithm 1: Active Learning for MTG-Net
Open Source Code Yes Data and code are available at https://github.com/ShawnKS/MTG-Net.
Open Datasets Yes Taskonomy [49] is a computer vision dataset... ETTm1 [46] is an electric load dataset... MIMIC-III [17] is a healthcare database...
Dataset Splits Yes we retain the same split of train, validation, and test sets for each MTL procedure and fix the optimization algorithm as well as other hyperparameters.
Hardware Specification No All these MTL procedures cost thousands of GPU hours in total, and we will release the collected meta datasets for future research. No specific GPU model, CPU, or cloud provider details are provided.
Software Dependencies No The paper mentions using neural networks and various learning paradigms but does not specify any software libraries or their version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We use the same hyper-parameters for all MTL datasets. To be specific, we set the dimension of task embeddings as D = 64 and stack 2 self-attention encoding layers [45]. As for Algorithm 1, we set α as 25 to prioritize the selection of task combinations with large gains. Besides, we leverage a dynamic strategy to schedule η. At an early stage when |Ctrain| <= N +1, we set η as 1 to frequently updating MTG-Net to pursue more effective selections. When |Ctrain| > N + 1, we set η as N to reduce the number of updating for MTG-Net to further improve efficiency. K is the hyper-parameter deciding the total number of meta-training samples, which is specified along with each figure.