Improving Multi-Task Generalization via Regularizing Spurious Correlation

Authors: Ziniu Hu, Zhe Zhao, Xinyang Yi, Tiansheng Yao, Lichan Hong, Yizhou Sun, Ed Chi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that MT-CRL could enhance MTL model s performance by 5.5% on average over Multi-MNIST, Movie Lens, Taskonomy, City Scape, and NYUv2, and show it could indeed alleviate spurious correlation problem.
Researcher Affiliation Collaboration Ziniu Hu1 , Zhe Zhao2, Xinyang Yi2, Tiansheng Yao2, Lichan Hong2, Yizhou Sun1, Ed H. Chi2 1University of California, Los Angeles, {bull, yzsun}@cs.ucla.edu 2Google Research, Brain Team, {zhezhao,xinyang,tyao,lichan,edchi}@google.com
Pseudocode Yes We provide pseudo-code of MT-CRL framework in Appendix C.
Open Source Code Yes 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] They are all included in the supplemental material.
Open Datasets Yes We choose five widely-used real-world MTL benchmark datasets, i.e., Multi-MNIST (Sun, 2019), Movie Lens (Harper & Konstan, 2016), Tasknomy (Zamir et al., 2018), NYUv2 (Silberman et al., 2012) and City Scape (Cordts et al., 2016)
Dataset Splits Yes For each dataset, to mimic distribution shifts, we adopt some attribute information given in the dataset, such as the released time of the movie or district of a building, to split train/valid/test datasets.
Hardware Specification Yes 3. If you ran experiments... (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] They are specified in Appendix E.
Software Dependencies No The paper discusses architectural choices like MMo E and β-VAE, but does not provide specific software dependencies with version numbers for replication.
Experiment Setup Yes Hyper-Parameter Selection. For a fair comparison, all methods are based on the same MMo E architecture. Our methods contain a lot of hyper-parameters, including some model specific ones such as number of modules (K) and regularization specific ones. To avoid the case that performance improvement is caused by extensive hyper-parameter tuning, we mainly search optimal model hyperparameter on Vanilla MTL setting, and use for all baselines. For regularization specific parameters, we take Multi-MNIST, the simplest dataset among the testbed, to find a optimal combination, and use for all other datasets. Detailed selection procedure and results are shown in Appendix H.