DMTG: One-Shot Differentiable Multi-Task Grouping

Authors: Yuan Gao, Shuguo Jiang, Moran Li, Jin-Gang Yu, Gui-Song Xia

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on Celeb A and Taskonomy datasets with detailed ablations show the promising performance and efficiency of our method.
Researcher Affiliation Collaboration 1School of CS, Wuhan University 2School of EI, Wuhan University 3Tencent Youtu Lab 4School of Automation Science and Engineering, South China University of Technology.
Pseudocode No The paper describes its method in prose and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The codes are available at https://github.com/ethanygao/DMTG.
Open Datasets Yes We perform experiments on the Taskonomy dataset (Zamir et al., 2018) following (Standley et al., 2020; Fifty et al., 2021; Song et al., 2022), and the Celeb A dataset (Liu et al., 2015) following (Fifty et al., 2021).Celeb A (Liu et al., 2015) is a large-scale face dataset that contains more than 200, 000 images from roughly 10, 000 identities, each of which has annotated 40 face attributes representing the tasks to predict.Taskonomy (Zamir et al., 2018) is one of the largest datasets with multi-task labels, covering 26 vision tasks from 2D to 3D.
Dataset Splits Yes We use the official tiny train, validation, and test split of Taskonomy.
Hardware Specification Yes We illustrate the GFLOPs, and the training time (hour) is obtained on a single NVIDIA 4090 GPU.
Software Dependencies No The paper mentions the use of an 'Adam optimizer' but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks.
Experiment Setup Yes We use Adam optimizer for all of our experiments, where the initial learning rates are 0.0008 and 0.0001 for the Celeb A and Taskonomy experiments, respectively. We use plateau learning rate decay which reduces by 0.5 when the validation loss no longer improves. We train all the experiments for 100 epochs, where our networks are initialized by the pre-trained naive MTL weights on the corresponding experiments. We initialize the Gumbel Softmax temperature τ of Eq. (5) as 2.5 and 4 for the Celeb A and Taskonomy experiments, respectively.