reproducibilityindex.ai

RotoGrad: Gradient Homogenization in Multitask Learning

Authors: Adrián Javaloy, Isabel Valera

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we run extensive experiments to empirically demonstrate that Roto Grad leads to stable (convergent) learning, scales up to complex network architectures, and outperforms competing methods in multi-label classiﬁcation settings in CIFAR10 and Celeb A, as well as in computer vision tasks using the NYUv2 dataset.
Researcher Affiliation	Academia	Adrián Javaloy Department of Computer Science Saarland University Saarbrücken, Germany ajavaloy@cs.uni-saarland.de
Pseudocode	Yes	Algorithm 1 Training step with Roto Grad.
Open Source Code	Yes	A Pytorch implementation can be found in https://github.com/adrianjav/rotograd.
Open Datasets	Yes	We test all methods on three different tasks of NYUv2 (Couprie et al., 2013)...We test Roto Grad on a 10-task classiﬁcation problem on CIFAR10 (Krizhevsky et al., 2009)...we use a multitask version of MNIST (Le Cun et al., 2010)...and SVHN (Netzer et al., 2011)...We use Celeb A (Liu et al., 2015) as dataset with usual splits.
Dataset Splits	Yes	For the single training of a model, we select the parameters of the model by taking those that obtained the best validation error after each training epoch.
Hardware Specification	Yes	Computational resources. All experiments were performed on a shared cluster system with two NVIDIA DGX-A100. Therefore, all experiments were run with (up to) 4 cores of AMD EPYC 7742 CPUs and, for those trained on GPU (CIFAR10, Celeb A, and NYUv2), a single NVIDIA A100 GPU. All experiments were restricted to 12 GB of RAM.
Software Dependencies	No	The paper mentions software and libraries like PyTorch, RAdam, Adam, and Geotorch. However, it does not provide specific version numbers for these software components, which is required for reproducible software dependencies.
Experiment Setup	Yes	Model hyperparameters. For both datasets, we train the model for 300 epochs using a batch size of 1024. For the network parameters, we use RAdam (Liu et al., 2019a) with a learning rate of 1e 3.