Variational Multi-Task Learning with Gumbel-Softmax Priors

Authors: Jiayi Shen, Xiantong Zhen, Marcel Worring, Ling Shao

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that the proposed VMTL is able to effectively tackle a variety of challenging multi-task learning settings with limited training data for both classification and regression. Our method consistently surpasses previous methods, including strong Bayesian approaches, and achieves state-of-the-art performance on five benchmark datasets. We conduct extensive experiments to evaluate the proposed VMTL on five benchmark datasets for both classification and regression tasks.
Researcher Affiliation Collaboration 1AIM Lab, University of Amsterdam, Netherlands 2Inception Institute of Artificial Intelligence, Abu Dhabi, UAE
Pseudocode No The paper does not contain any explicit pseudocode or algorithm blocks.
Open Source Code Yes The code will be available at https://github.com/autumn9999/VMTL.git.
Open Datasets Yes Office-Home [47] contains images from four domains/tasks: Artistic (A), Clipart (C), Product (P) and Real-world (R). Office-Caltech [16] contains the ten categories shared between Office-31 [39] and Caltech-256 [18]. Image CLEF [33], the benchmark for the Image CLEF domain adaptation challenge, contains 12 common categories shared by four public datasets/tasks: Caltech-256 (C), Image Net ILSVRC 2012 (I), Pascal VOC 2012 (P), and Bing (B). Domain Net [36] is a large-scale dataset with approximately 0.6 million images distributed among 345 categories. Rotated MNIST [29] is adopted for angle regression tasks.
Dataset Splits Yes randomly selecting 5%, 10%, and 20% of samples from each task in the dataset as the training set, using the remaining samples as the test set [33]. For the large-scale Domain Net dataset, we set the splits to 1%, 2% and 4%... For the regression dataset, Rotated MNIST, we set the splits to 0.1% and 0.2%...
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions models and optimizers (VGGnet, MLPs, Adam optimizer) but does not provide specific software dependencies with version numbers.
Experiment Setup Yes We adopt the Adam optimizer [27] with a learning rate of 1e-4 for training. All the results are obtained based on a 95% confidence interval from five runs. The temperature of the Gumbel-Softmax priors (6) and (9) is annealed using the same schedule applied in [22]: we start with a high temperature and gradually anneal it to a small but non-zero value. For the KL-divergence in (11), we use the annealing scheme from [6]. L and M are set to 10, which yields good performance while being computationally efficient.