reproducibilityindex.ai

Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout

Authors: Zhao Chen, Jiquan Ngiam, Yanping Huang, Thang Luong, Henrik Kretzschmar, Yuning Chai, Dragomir Anguelov

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that Grad Drop outperforms the state-of-the-art multiloss methods within traditional multitask and transfer learning settings, and we discuss how Grad Drop reveals links between optimal multiloss training and gradient stochasticity.
Researcher Affiliation	Industry	Zhao Chen Waymo LLC Mountain View, CA 94043 zhaoch@waymo.com Jiquan Ngiam Google Research Mountain View, CA 94043 jngiam@google.com Yanping Huang Google Research Mountain View, CA 94043 huangyp@google.com Thang Luong Google Research Mountain View, CA 94043 thangluong@google.com Henrik Kretzschmar Waymo LLC Mountain View, CA 94043 kretzschmar@waymo.com Yuning Chai Waymo LLC Mountain View, CA 94043 chaiy@waymo.com Dragomir Anguelov Waymo LLC Mountain View, CA 94043 dragomir@waymo.com
Pseudocode	Yes	Algorithm 1 Gradient Sign Dropout Layer (Grad Drop Layer)
Open Source Code	No	The paper does not provide an unambiguous statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We also rely exclusively on standard public datasets, and thus move discussion of most dataset properties to the Appendices. [...] We first test Grad Drop on the multitask learning dataset Celeb A [26] [...] We transfer Image Net2012 [5] to CIFAR-100 [21] [...] 3D vehicle detection from point clouds on the Waymo Open Dataset [42].
Dataset Splits	No	The paper states that it 'relies exclusively on standard public datasets' and conducts 'training runs', but does not explicitly provide specific details on the train/validation/test dataset splits (e.g., percentages, sample counts, or a detailed splitting methodology) required for reproduction.
Hardware Specification	Yes	All experiments are run on NVIDIA V100 GPU hardware.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with versions like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup	Yes	We will provide relevant hyperparameters within the main text, but we relegate a complete listing of hyperparameters to the Appendix. For many of our experiments, we renormalize the final gradients so that \|\|r\|\|2 remains constant throughout the Grad Drop process. For our final Grad Drop model we use a leak parameter i set to 1.0 for the source set. All runs include gradient clipping at norm 1.0.