Variational Multi-Task Learning with Gumbel-Softmax Priors
Authors: Jiayi Shen, Xiantong Zhen, Marcel Worring, Ling Shao
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that the proposed VMTL is able to effectively tackle a variety of challenging multi-task learning settings with limited training data for both classification and regression. Our method consistently surpasses previous methods, including strong Bayesian approaches, and achieves state-of-the-art performance on five benchmark datasets. We conduct extensive experiments to evaluate the proposed VMTL on five benchmark datasets for both classification and regression tasks. |
| Researcher Affiliation | Collaboration | 1AIM Lab, University of Amsterdam, Netherlands 2Inception Institute of Artificial Intelligence, Abu Dhabi, UAE |
| Pseudocode | No | The paper does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code will be available at https://github.com/autumn9999/VMTL.git. |
| Open Datasets | Yes | Office-Home [47] contains images from four domains/tasks: Artistic (A), Clipart (C), Product (P) and Real-world (R). Office-Caltech [16] contains the ten categories shared between Office-31 [39] and Caltech-256 [18]. Image CLEF [33], the benchmark for the Image CLEF domain adaptation challenge, contains 12 common categories shared by four public datasets/tasks: Caltech-256 (C), Image Net ILSVRC 2012 (I), Pascal VOC 2012 (P), and Bing (B). Domain Net [36] is a large-scale dataset with approximately 0.6 million images distributed among 345 categories. Rotated MNIST [29] is adopted for angle regression tasks. |
| Dataset Splits | Yes | randomly selecting 5%, 10%, and 20% of samples from each task in the dataset as the training set, using the remaining samples as the test set [33]. For the large-scale Domain Net dataset, we set the splits to 1%, 2% and 4%... For the regression dataset, Rotated MNIST, we set the splits to 0.1% and 0.2%... |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions models and optimizers (VGGnet, MLPs, Adam optimizer) but does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | We adopt the Adam optimizer [27] with a learning rate of 1e-4 for training. All the results are obtained based on a 95% confidence interval from five runs. The temperature of the Gumbel-Softmax priors (6) and (9) is annealed using the same schedule applied in [22]: we start with a high temperature and gradually anneal it to a small but non-zero value. For the KL-divergence in (11), we use the annealing scheme from [6]. L and M are set to 10, which yields good performance while being computationally efficient. |