Learning to Multitask

Authors: Yu Zhang, Ying Wei, Qiang Yang

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on benchmark datasets show the effectiveness of the proposed L2MT framework.
Researcher Affiliation Collaboration Yu Zhang1, Ying Wei2, Qiang Yang1 1HKUST 2Tencent AI Lab yu.zhang.ust@gmail.com judywei@tencent.com qyang@cse.ust.hk
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements or links indicating that open-source code for the methodology is available.
Open Datasets Yes Four datasets are used in the experiments, including the MIT-Indoor-Scene, Caltech256, 20newsgroup, and RCV1 datasets. The MIT-Indoor-Scene and Caltech256 datasets are for image classification, while the 20newsgroup and RCV1 datasets are for text classification.
Dataset Splits Yes Each multitask problem Si consists of mi learning tasks each of which is associated with a training dataset, a validation dataset, and a test dataset... we vary the size of training data from 30% to 50% at an interval of 10% with the validation proportion fixed to 30% in the test process
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments, such as specific CPU or GPU models.
Software Dependencies No The paper mentions using 'the Adam optimizer in the tensorflow package' but does not specify any version numbers for TensorFlow or other software dependencies.
Experiment Setup Yes Each entry in {Li} is initialized to be normally distributed with zero mean and variance of 1/100, and the biases {βi} are initialized to be zero. α in the estimation function is initialized to [1, 1, 1, 0.1]T and γ in the link function is initialized to [1, 0]T . The learning rate linearly decays from 0.01 with respect to the number of epoches... when λ is in [0.01, 0.5] and k in [5, 10], the performance is not so sensitive that the choices are easier and hence in experiments we always set λ and k to 0.1 and 6... Based on such observation, ˆd is set to be 50.