Auxiliary Learning by Implicit Differentiation

Authors: Aviv Navon, Idan Achituve, Haggai Maron, Gal Chechik, Ethan Fetaya

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Auxi Learn in a series of tasks and domains, including image segmentation and learning with attributes in the low data regime, and find that it consistently outperforms competing methods.
Researcher Affiliation Collaboration Aviv Navon Bar-Ilan University, Israel aviv.navon@biu.ac.il Idan Achituve Bar-Ilan University, Israel idan.achituve@biu.ac.il Haggai Maron NVIDIA, Israel hmaron@nvidia.com Gal Chechik Bar-Ilan University, Israel NVIDIA, Israel gal.chechik@biu.ac.il Ethan Fetaya Bar-Ilan University, Israel ethan.fetaya@biu.ac.il
Pseudocode Yes We summarize our method in Alg. 1 and 2. Algorithm 1: Auxi Learn Algorithm 2: Hypergradient
Open Source Code Yes Our code is available at https://github.com/AvivNavon/AuxiLearn.
Open Datasets Yes We evaluate Auxi Learn in a series of tasks and domains, including image segmentation and learning with attributes in the low data regime, and find that it consistently outperforms competing methods. (...) Caltech UCSD Birds 200-2011 dataset (CUB) (Wah et al., 2011). (...) NYUv2 dataset (Silberman et al., 2012). (...) CIFAR10, CIFAR100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), and three fine-grained classification datasets: CUB-200-2011, Oxford-IIIT Pet (Parkhi et al., 2012), and Cars (Krause et al., 2013). (...) Cityscapes (Cordts et al., 2016) is a high-quality urban-scene dataset. (...) Shape Net part dataset (Yi et al., 2016).
Dataset Splits Yes Let {(xt i, yt i)}i be the training set and {(xa i , ya i )}i be a distinct independent set which we term auxiliary set. (...) Throughout all experiments, we use an extra data split for the auxiliary set. Hence, we use four data sets: training set, validation set, test set, and auxiliary set. (...) We split the predefined test set to 2897 samples for validation and 2897 for testing. (...) We further split the train set to allocate 79 images, 10% of training examples, to construct a validation set.
Hardware Specification Yes The total training time of all methods was 3 hours on a 16GB Nvidia V100 GPU.
Software Dependencies No The paper mentions software components like "ADAM optimizer" and "SGD with momentum," but it does not specify exact version numbers for these or any other libraries/frameworks (e.g., PyTorch, TensorFlow, Python version) that would be needed for replication.
Experiment Setup Yes We applied grid search over the learning rates in {1e 3, 1e 4, 1e 5} and the weight decay in {5e 3, 5e 4, 5e 5}. For DWA (Liu et al., 2019b), we searched over the temperature in {0.5, 2, 5} and for Grad Norm (Chen et al., 2018), over α in {0.3, 0.8, 1.5}. (...) The auxiliary network was optimized using SGD with 0.9 momentum. We applied grid search over the auxiliary network learning rate in {1e 2, 1e 3} and weight decay in {1e 5, 5e 5}.