reproducibilityindex.ai

Improving Multi-Task Generalization via Regularizing Spurious Correlation

Authors: Ziniu Hu, Zhe Zhao, Xinyang Yi, Tiansheng Yao, Lichan Hong, Yizhou Sun, Ed Chi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that MT-CRL could enhance MTL model s performance by 5.5% on average over Multi-MNIST, Movie Lens, Taskonomy, City Scape, and NYUv2, and show it could indeed alleviate spurious correlation problem.
Researcher Affiliation	Collaboration	Ziniu Hu1 , Zhe Zhao2, Xinyang Yi2, Tiansheng Yao2, Lichan Hong2, Yizhou Sun1, Ed H. Chi2 1University of California, Los Angeles, {bull, yzsun}@cs.ucla.edu 2Google Research, Brain Team, {zhezhao,xinyang,tyao,lichan,edchi}@google.com
Pseudocode	Yes	We provide pseudo-code of MT-CRL framework in Appendix C.
Open Source Code	Yes	3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] They are all included in the supplemental material.
Open Datasets	Yes	We choose ﬁve widely-used real-world MTL benchmark datasets, i.e., Multi-MNIST (Sun, 2019), Movie Lens (Harper & Konstan, 2016), Tasknomy (Zamir et al., 2018), NYUv2 (Silberman et al., 2012) and City Scape (Cordts et al., 2016)
Dataset Splits	Yes	For each dataset, to mimic distribution shifts, we adopt some attribute information given in the dataset, such as the released time of the movie or district of a building, to split train/valid/test datasets.
Hardware Specification	Yes	3. If you ran experiments... (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] They are speciﬁed in Appendix E.
Software Dependencies	No	The paper discusses architectural choices like MMo E and β-VAE, but does not provide specific software dependencies with version numbers for replication.
Experiment Setup	Yes	Hyper-Parameter Selection. For a fair comparison, all methods are based on the same MMo E architecture. Our methods contain a lot of hyper-parameters, including some model speciﬁc ones such as number of modules (K) and regularization speciﬁc ones. To avoid the case that performance improvement is caused by extensive hyper-parameter tuning, we mainly search optimal model hyperparameter on Vanilla MTL setting, and use for all baselines. For regularization speciﬁc parameters, we take Multi-MNIST, the simplest dataset among the testbed, to ﬁnd a optimal combination, and use for all other datasets. Detailed selection procedure and results are shown in Appendix H.