Relational Surrogate Loss Learning

Authors: Tao Huang, Zekang Li, Hua Lu, Yong Shan, Shusheng Yang, Yang Feng, Fei Wang, Shan You, Chang Xu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our method achieves improvements on various tasks including image classification and neural machine translation, and even outperforms state-of-theart methods on human pose estimation and machine reading comprehension tasks.
Researcher Affiliation Collaboration Tao Huang12, Zekang Li3, Hua Lu4, Yong Shan3, Shusheng Yang4, Yang Feng3, Fei Wang5, Shan You2 , Chang Xu1 1School of Computer Science, Faculty of Engineering, The University of Sydney 2Sense Time Research 3Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS) 4Huazhong University of Science and Technology 5University of Science and Technology of China
Pseudocode Yes Algorithm 1 Learning of surrogate losses. Input: surrogate loss L with random weights θl, batch size N, metric function M, data generators GM and GR, sample probability p. Output: learned surrogate loss with highest correlation.
Open Source Code Yes Code is available at: https://github.com/hunto/Re Loss.
Open Datasets Yes We conduct experiments on three benchmark datasets CIFAR-10, CIFAR100 (Krizhevsky et al., 2009), and Image Net (Deng et al., 2009).
Dataset Splits Yes On CIFAR-10 and CIFAR-100 datasets, we train Res Net-20 for 200 epochs with an initial learning rate of 0.1, which decays 0.1 at 100th and 150th epochs, the batch size is set to 128 with cutout (De Vries & Taylor, 2017) data augmentation, we run each experiment 5 times with different random seeds and report their mean accuracy with standard derivation." and "We take news-dev-2016 and news-test-2016 as development and test sets.
Hardware Specification Yes Note that our additional cost O(Tl) of learning Re Loss costs only 0.5 GPU hour on image classification with a single NVIDIA TITAN Xp GPU, and we only need to train Re Loss once for each task, reducing much computational cost compared to previous works.
Software Dependencies No The paper mentions software components like 'MMPose' and 'torchvision', and optimizers like 'Adam' and 'SGD', but does not provide specific version numbers for these software dependencies, nor does it list programming language versions.
Experiment Setup Yes On CIFAR-10 and CIFAR-100 datasets, we train Res Net-20 for 200 epochs with an initial learning rate of 0.1, which decays 0.1 at 100th and 150th epochs, the batch size is set to 128 with cutout (De Vries & Taylor, 2017) data augmentation, we run each experiment 5 times with different random seeds and report their mean accuracy with standard derivation.