Learning to Multitask
Authors: Yu Zhang, Ying Wei, Qiang Yang
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on benchmark datasets show the effectiveness of the proposed L2MT framework. |
| Researcher Affiliation | Collaboration | Yu Zhang1, Ying Wei2, Qiang Yang1 1HKUST 2Tencent AI Lab yu.zhang.ust@gmail.com judywei@tencent.com qyang@cse.ust.hk |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that open-source code for the methodology is available. |
| Open Datasets | Yes | Four datasets are used in the experiments, including the MIT-Indoor-Scene, Caltech256, 20newsgroup, and RCV1 datasets. The MIT-Indoor-Scene and Caltech256 datasets are for image classification, while the 20newsgroup and RCV1 datasets are for text classification. |
| Dataset Splits | Yes | Each multitask problem Si consists of mi learning tasks each of which is associated with a training dataset, a validation dataset, and a test dataset... we vary the size of training data from 30% to 50% at an interval of 10% with the validation proportion fixed to 30% in the test process |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments, such as specific CPU or GPU models. |
| Software Dependencies | No | The paper mentions using 'the Adam optimizer in the tensorflow package' but does not specify any version numbers for TensorFlow or other software dependencies. |
| Experiment Setup | Yes | Each entry in {Li} is initialized to be normally distributed with zero mean and variance of 1/100, and the biases {βi} are initialized to be zero. α in the estimation function is initialized to [1, 1, 1, 0.1]T and γ in the link function is initialized to [1, 0]T . The learning rate linearly decays from 0.01 with respect to the number of epoches... when λ is in [0.01, 0.5] and k in [5, 10], the performance is not so sensitive that the choices are easier and hence in experiments we always set λ and k to 0.1 and 6... Based on such observation, ˆd is set to be 50. |