Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
Authors: Zhao Chen, Jiquan Ngiam, Yanping Huang, Thang Luong, Henrik Kretzschmar, Yuning Chai, Dragomir Anguelov
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that Grad Drop outperforms the state-of-the-art multiloss methods within traditional multitask and transfer learning settings, and we discuss how Grad Drop reveals links between optimal multiloss training and gradient stochasticity. |
| Researcher Affiliation | Industry | Zhao Chen Waymo LLC Mountain View, CA 94043 zhaoch@waymo.com Jiquan Ngiam Google Research Mountain View, CA 94043 jngiam@google.com Yanping Huang Google Research Mountain View, CA 94043 huangyp@google.com Thang Luong Google Research Mountain View, CA 94043 thangluong@google.com Henrik Kretzschmar Waymo LLC Mountain View, CA 94043 kretzschmar@waymo.com Yuning Chai Waymo LLC Mountain View, CA 94043 chaiy@waymo.com Dragomir Anguelov Waymo LLC Mountain View, CA 94043 dragomir@waymo.com |
| Pseudocode | Yes | Algorithm 1 Gradient Sign Dropout Layer (Grad Drop Layer) |
| Open Source Code | No | The paper does not provide an unambiguous statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We also rely exclusively on standard public datasets, and thus move discussion of most dataset properties to the Appendices. [...] We first test Grad Drop on the multitask learning dataset Celeb A [26] [...] We transfer Image Net2012 [5] to CIFAR-100 [21] [...] 3D vehicle detection from point clouds on the Waymo Open Dataset [42]. |
| Dataset Splits | No | The paper states that it 'relies exclusively on standard public datasets' and conducts 'training runs', but does not explicitly provide specific details on the train/validation/test dataset splits (e.g., percentages, sample counts, or a detailed splitting methodology) required for reproduction. |
| Hardware Specification | Yes | All experiments are run on NVIDIA V100 GPU hardware. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with versions like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | Yes | We will provide relevant hyperparameters within the main text, but we relegate a complete listing of hyperparameters to the Appendix. For many of our experiments, we renormalize the final gradients so that ||r||2 remains constant throughout the Grad Drop process. For our final Grad Drop model we use a leak parameter i set to 1.0 for the source set. All runs include gradient clipping at norm 1.0. |