reproducibilityindex.ai

Auxiliary Learning by Implicit Differentiation

Authors: Aviv Navon, Idan Achituve, Haggai Maron, Gal Chechik, Ethan Fetaya

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Auxi Learn in a series of tasks and domains, including image segmentation and learning with attributes in the low data regime, and ﬁnd that it consistently outperforms competing methods.
Researcher Affiliation	Collaboration	Aviv Navon Bar-Ilan University, Israel aviv.navon@biu.ac.il Idan Achituve Bar-Ilan University, Israel idan.achituve@biu.ac.il Haggai Maron NVIDIA, Israel hmaron@nvidia.com Gal Chechik Bar-Ilan University, Israel NVIDIA, Israel gal.chechik@biu.ac.il Ethan Fetaya Bar-Ilan University, Israel ethan.fetaya@biu.ac.il
Pseudocode	Yes	We summarize our method in Alg. 1 and 2. Algorithm 1: Auxi Learn Algorithm 2: Hypergradient
Open Source Code	Yes	Our code is available at https://github.com/AvivNavon/AuxiLearn.
Open Datasets	Yes	We evaluate Auxi Learn in a series of tasks and domains, including image segmentation and learning with attributes in the low data regime, and ﬁnd that it consistently outperforms competing methods. (...) Caltech UCSD Birds 200-2011 dataset (CUB) (Wah et al., 2011). (...) NYUv2 dataset (Silberman et al., 2012). (...) CIFAR10, CIFAR100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), and three ﬁne-grained classiﬁcation datasets: CUB-200-2011, Oxford-IIIT Pet (Parkhi et al., 2012), and Cars (Krause et al., 2013). (...) Cityscapes (Cordts et al., 2016) is a high-quality urban-scene dataset. (...) Shape Net part dataset (Yi et al., 2016).
Dataset Splits	Yes	Let {(xt i, yt i)}i be the training set and {(xa i , ya i )}i be a distinct independent set which we term auxiliary set. (...) Throughout all experiments, we use an extra data split for the auxiliary set. Hence, we use four data sets: training set, validation set, test set, and auxiliary set. (...) We split the predeﬁned test set to 2897 samples for validation and 2897 for testing. (...) We further split the train set to allocate 79 images, 10% of training examples, to construct a validation set.
Hardware Specification	Yes	The total training time of all methods was 3 hours on a 16GB Nvidia V100 GPU.
Software Dependencies	No	The paper mentions software components like "ADAM optimizer" and "SGD with momentum," but it does not specify exact version numbers for these or any other libraries/frameworks (e.g., PyTorch, TensorFlow, Python version) that would be needed for replication.
Experiment Setup	Yes	We applied grid search over the learning rates in {1e 3, 1e 4, 1e 5} and the weight decay in {5e 3, 5e 4, 5e 5}. For DWA (Liu et al., 2019b), we searched over the temperature in {0.5, 2, 5} and for Grad Norm (Chen et al., 2018), over α in {0.3, 0.8, 1.5}. (...) The auxiliary network was optimized using SGD with 0.9 momentum. We applied grid search over the auxiliary network learning rate in {1e 2, 1e 3} and weight decay in {1e 5, 5e 5}.