Improving Multi-Task Generalization via Regularizing Spurious Correlation
Authors: Ziniu Hu, Zhe Zhao, Xinyang Yi, Tiansheng Yao, Lichan Hong, Yizhou Sun, Ed Chi
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that MT-CRL could enhance MTL model s performance by 5.5% on average over Multi-MNIST, Movie Lens, Taskonomy, City Scape, and NYUv2, and show it could indeed alleviate spurious correlation problem. |
| Researcher Affiliation | Collaboration | Ziniu Hu1 , Zhe Zhao2, Xinyang Yi2, Tiansheng Yao2, Lichan Hong2, Yizhou Sun1, Ed H. Chi2 1University of California, Los Angeles, {bull, yzsun}@cs.ucla.edu 2Google Research, Brain Team, {zhezhao,xinyang,tyao,lichan,edchi}@google.com |
| Pseudocode | Yes | We provide pseudo-code of MT-CRL framework in Appendix C. |
| Open Source Code | Yes | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] They are all included in the supplemental material. |
| Open Datasets | Yes | We choose five widely-used real-world MTL benchmark datasets, i.e., Multi-MNIST (Sun, 2019), Movie Lens (Harper & Konstan, 2016), Tasknomy (Zamir et al., 2018), NYUv2 (Silberman et al., 2012) and City Scape (Cordts et al., 2016) |
| Dataset Splits | Yes | For each dataset, to mimic distribution shifts, we adopt some attribute information given in the dataset, such as the released time of the movie or district of a building, to split train/valid/test datasets. |
| Hardware Specification | Yes | 3. If you ran experiments... (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] They are specified in Appendix E. |
| Software Dependencies | No | The paper discusses architectural choices like MMo E and β-VAE, but does not provide specific software dependencies with version numbers for replication. |
| Experiment Setup | Yes | Hyper-Parameter Selection. For a fair comparison, all methods are based on the same MMo E architecture. Our methods contain a lot of hyper-parameters, including some model specific ones such as number of modules (K) and regularization specific ones. To avoid the case that performance improvement is caused by extensive hyper-parameter tuning, we mainly search optimal model hyperparameter on Vanilla MTL setting, and use for all baselines. For regularization specific parameters, we take Multi-MNIST, the simplest dataset among the testbed, to find a optimal combination, and use for all other datasets. Detailed selection procedure and results are shown in Appendix H. |