SNR: Sub-Network Routing for Flexible Parameter Sharing in Multi-Task Learning

Authors: Jiaqi Ma, Zhe Zhao, Jilin Chen, Ang Li, Lichan Hong, Ed H. Chi216-223

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on a large public video dataset, You Tube8M (Abu-El-Haija et al. 2016). Our experiment indicates that both SNR-Trans and SNR-Aver significantly outperform several baseline multi-task models.
Researcher Affiliation Collaboration Jiaqi Ma,1 Zhe Zhao,2 Jilin Chen,2 Ang Li,3 Lichan Hong,2 Ed H. Chi2 1School of Information, University of Michigan, Ann Arbor 2Google AI 3Deep Mind
Pseudocode No The paper does not contain explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We use You Tube8M (Abu-El-Haija et al. 2016) as our benchmark dataset to evaluate the effectiveness of the proposed methods.
Dataset Splits Yes We use the training set provided in the original dataset as our training set, and split the original validation set into our own validation set and test set, because this dataset comes from a Kaggle competition and the original test set labels are hidden to the public.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for its experiments. It mentions 'computation cost' and 'computation efficiency' in general terms.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies. It mentions using 'Adam (Kingma and Welling 2013)' but without a version number.
Experiment Setup Yes All the models are trained using Adam (Kingma and Welling 2013) with learning rate as a tunable hyperparameter. The batch size is fixed as 128. Early stopping is used on the validation set. Model size related hyper-parameters are tuned with grid search... The L0 regularization parameter λ will have an impact on the serving model size so we grid-search it from {0.001, 0.0001, 0.00001}. The learning rates of all models are random-searched within [0.00001, 0.1] in log-scale. The hyper-parameters for the hard concrete distribution used in L-Act and L-Param models are random-searched from the following range: β [0.5, 0.9], γ [ 1, 0.1], ζ [1.1, 2].