Efficient and Effective Multi-task Grouping via Meta Learning on Task Combinations
Authors: Xiaozhuang Song, Shun Zheng, Wei Cao, James Yu, Jiang Bian
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on diversified multi-task scenarios demonstrate the efficiency and effectiveness of our method. |
| Researcher Affiliation | Collaboration | Xiaozhuang Song Southern University of Science and Technology shawnsxz97@gmail.com Shun Zheng Microsoft Research shun.zheng@microsoft.com Wei Cao Microsoft Research wei.cao@microsoft.com James J.Q. Yu Southern University of Science and Technology yujq3@sustech.edu.cn Jiang Bian Microsoft Research jiang.bian@microsoft.com |
| Pseudocode | Yes | Algorithm 1: Active Learning for MTG-Net |
| Open Source Code | Yes | Data and code are available at https://github.com/ShawnKS/MTG-Net. |
| Open Datasets | Yes | Taskonomy [49] is a computer vision dataset... ETTm1 [46] is an electric load dataset... MIMIC-III [17] is a healthcare database... |
| Dataset Splits | Yes | we retain the same split of train, validation, and test sets for each MTL procedure and fix the optimization algorithm as well as other hyperparameters. |
| Hardware Specification | No | All these MTL procedures cost thousands of GPU hours in total, and we will release the collected meta datasets for future research. No specific GPU model, CPU, or cloud provider details are provided. |
| Software Dependencies | No | The paper mentions using neural networks and various learning paradigms but does not specify any software libraries or their version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We use the same hyper-parameters for all MTL datasets. To be specific, we set the dimension of task embeddings as D = 64 and stack 2 self-attention encoding layers [45]. As for Algorithm 1, we set α as 25 to prioritize the selection of task combinations with large gains. Besides, we leverage a dynamic strategy to schedule η. At an early stage when |Ctrain| <= N +1, we set η as 1 to frequently updating MTG-Net to pursue more effective selections. When |Ctrain| > N + 1, we set η as N to reduce the number of updating for MTG-Net to further improve efficiency. K is the hyper-parameter deciding the total number of meta-training samples, which is specified along with each figure. |