Auxiliary Learning by Implicit Differentiation
Authors: Aviv Navon, Idan Achituve, Haggai Maron, Gal Chechik, Ethan Fetaya
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Auxi Learn in a series of tasks and domains, including image segmentation and learning with attributes in the low data regime, and find that it consistently outperforms competing methods. |
| Researcher Affiliation | Collaboration | Aviv Navon Bar-Ilan University, Israel aviv.navon@biu.ac.il Idan Achituve Bar-Ilan University, Israel idan.achituve@biu.ac.il Haggai Maron NVIDIA, Israel hmaron@nvidia.com Gal Chechik Bar-Ilan University, Israel NVIDIA, Israel gal.chechik@biu.ac.il Ethan Fetaya Bar-Ilan University, Israel ethan.fetaya@biu.ac.il |
| Pseudocode | Yes | We summarize our method in Alg. 1 and 2. Algorithm 1: Auxi Learn Algorithm 2: Hypergradient |
| Open Source Code | Yes | Our code is available at https://github.com/AvivNavon/AuxiLearn. |
| Open Datasets | Yes | We evaluate Auxi Learn in a series of tasks and domains, including image segmentation and learning with attributes in the low data regime, and find that it consistently outperforms competing methods. (...) Caltech UCSD Birds 200-2011 dataset (CUB) (Wah et al., 2011). (...) NYUv2 dataset (Silberman et al., 2012). (...) CIFAR10, CIFAR100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), and three fine-grained classification datasets: CUB-200-2011, Oxford-IIIT Pet (Parkhi et al., 2012), and Cars (Krause et al., 2013). (...) Cityscapes (Cordts et al., 2016) is a high-quality urban-scene dataset. (...) Shape Net part dataset (Yi et al., 2016). |
| Dataset Splits | Yes | Let {(xt i, yt i)}i be the training set and {(xa i , ya i )}i be a distinct independent set which we term auxiliary set. (...) Throughout all experiments, we use an extra data split for the auxiliary set. Hence, we use four data sets: training set, validation set, test set, and auxiliary set. (...) We split the predefined test set to 2897 samples for validation and 2897 for testing. (...) We further split the train set to allocate 79 images, 10% of training examples, to construct a validation set. |
| Hardware Specification | Yes | The total training time of all methods was 3 hours on a 16GB Nvidia V100 GPU. |
| Software Dependencies | No | The paper mentions software components like "ADAM optimizer" and "SGD with momentum," but it does not specify exact version numbers for these or any other libraries/frameworks (e.g., PyTorch, TensorFlow, Python version) that would be needed for replication. |
| Experiment Setup | Yes | We applied grid search over the learning rates in {1e 3, 1e 4, 1e 5} and the weight decay in {5e 3, 5e 4, 5e 5}. For DWA (Liu et al., 2019b), we searched over the temperature in {0.5, 2, 5} and for Grad Norm (Chen et al., 2018), over α in {0.3, 0.8, 1.5}. (...) The auxiliary network was optimized using SGD with 0.9 momentum. We applied grid search over the auxiliary network learning rate in {1e 2, 1e 3} and weight decay in {1e 5, 5e 5}. |