RotoGrad: Gradient Homogenization in Multitask Learning
Authors: Adrián Javaloy, Isabel Valera
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we run extensive experiments to empirically demonstrate that Roto Grad leads to stable (convergent) learning, scales up to complex network architectures, and outperforms competing methods in multi-label classification settings in CIFAR10 and Celeb A, as well as in computer vision tasks using the NYUv2 dataset. |
| Researcher Affiliation | Academia | Adrián Javaloy Department of Computer Science Saarland University Saarbrücken, Germany ajavaloy@cs.uni-saarland.de |
| Pseudocode | Yes | Algorithm 1 Training step with Roto Grad. |
| Open Source Code | Yes | A Pytorch implementation can be found in https://github.com/adrianjav/rotograd. |
| Open Datasets | Yes | We test all methods on three different tasks of NYUv2 (Couprie et al., 2013)...We test Roto Grad on a 10-task classification problem on CIFAR10 (Krizhevsky et al., 2009)...we use a multitask version of MNIST (Le Cun et al., 2010)...and SVHN (Netzer et al., 2011)...We use Celeb A (Liu et al., 2015) as dataset with usual splits. |
| Dataset Splits | Yes | For the single training of a model, we select the parameters of the model by taking those that obtained the best validation error after each training epoch. |
| Hardware Specification | Yes | Computational resources. All experiments were performed on a shared cluster system with two NVIDIA DGX-A100. Therefore, all experiments were run with (up to) 4 cores of AMD EPYC 7742 CPUs and, for those trained on GPU (CIFAR10, Celeb A, and NYUv2), a single NVIDIA A100 GPU. All experiments were restricted to 12 GB of RAM. |
| Software Dependencies | No | The paper mentions software and libraries like PyTorch, RAdam, Adam, and Geotorch. However, it does not provide specific version numbers for these software components, which is required for reproducible software dependencies. |
| Experiment Setup | Yes | Model hyperparameters. For both datasets, we train the model for 300 epochs using a batch size of 1024. For the network parameters, we use RAdam (Liu et al., 2019a) with a learning rate of 1e 3. |