Knowledge Refinery: Learning from Decoupled Label

Authors: Qianggang Ding, Sifan Wu, Tao Dai, Hao Sun, Jiadong Guo, Zhang-Hua Fu, Shutao Xia7228-7235

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To exhibit the generalization of KR, we evaluate our method in both fields of computer vision and natural language processing. Our empirical results show consistent performance gains under all experimental settings. We perform extensive tests on two general classification tasks: the image classification and the text recognition tasks.
Researcher Affiliation Academia 1 Tsinghua University, 2 PCL Research Center of Networks and Communications, Peng Cheng Laboratory, 3 The Chinese University of Hong Kong, 4 The Hong Kong University of Science and Technology, 5 International Digital Economy Academy, 6 The Chinese University of Hong Kong, Shenzhen, 7 Shenzhen Institute of Artificial Intelligence and Robotics for Society
Pseudocode Yes Algorithm 1 Workflow of Knowledge Refinery 1: S 0 Initialize the residual correlation matrix with zero matrix. 2: for epoch = 1, 2, . . . do 3: z = fθ(x) Here we omit mini-batch for simplicity. 4: p = softmax(z) 5: Lhard = LCE(p, q) Calculate the hard loss. 6: z(res) = (z)k k is the index of the ground-truth class. 7: p(res) = softmax(z(res)) 8: q(res) = softmax(Sk) 9: Lupd = DKL(q(res)||p(res)) Calculate the update loss. 10: Lres = DKL(p(res)||q(res)) Calculate the residual loss. 11: θ θ β Lhard + αLres Optimize the backbone network. 12: S S β αLupd Optimize the residual correlation matrix. 13: end for
Open Source Code No No, the paper does not include an unambiguous statement that the authors are releasing their code for the work described, nor does it provide a direct link to a source-code repository.
Open Datasets Yes Datasets. We use three benchmark datasets of image recognition in our evaluations: (i) CIFAR-10 (Krizhevsky and Hinton 2009):... (ii) CIFAR-100 (Krizhevsky and Hinton 2009):... (iii) Image Net-12 (Russakovsky et al. 2015):... Datasets. We use three benchmark datasets of text classification in our evaluations: (i) AGNews (Del Corso, Gulli, and Romani 2005):... (ii) Yahoo! Answers1:... (iii) Yelp Review Full2:...
Dataset Splits Yes For CIFAR-10 and CIFAR-100 datasets, we use Res Net-18 (He et al. 2016b) with pre-activation and Wide Res Net-28-10 (Zagoruyko and Komodakis 2016) as backbone network architectures. We use SGD optimizer with Nesterov momentum (Sutskever et al. 2013) and set an initial learning rate to 0.1, momentum to 0.9 and mini-batch size to 128. The learning rate dropped by 0.1 at the 60/120/160th epochs and we train for 200 epochs. The hyper-parameter α in all experiments is an exponential decay function γt, where γ = 0.99. We also adopt data augmentation such as horizontal flips and random crops to samples during the training stage.
Hardware Specification Yes We implemented all experiments with Pytorch framework on a single NVIDIA Tesla V100 GPU.
Software Dependencies No No, the paper mentions using 'Pytorch framework' but does not specify its version number or the versions of any other key software dependencies.
Experiment Setup Yes Settings. For CIFAR-10 and CIFAR-100 datasets, we use Res Net-18 (He et al. 2016b) with pre-activation and Wide Res Net-28-10 (Zagoruyko and Komodakis 2016) as backbone network architectures. We use SGD optimizer with Nesterov momentum (Sutskever et al. 2013) and set an initial learning rate to 0.1, momentum to 0.9 and mini-batch size to 128. The learning rate dropped by 0.1 at the 60/120/160th epochs and we train for 200 epochs. The hyper-parameter α in all experiments is an exponential decay function γt, where γ = 0.99. We also adopt data augmentation such as horizontal flips and random crops to samples during the training stage. For fairly comparison, we follow the experimental settings of (Zhang et al. 2018), which set the mini-batch size to 64, and drop the learning rate by 0.1 every 60 epochs.