Knowledge Refinery: Learning from Decoupled Label
Authors: Qianggang Ding, Sifan Wu, Tao Dai, Hao Sun, Jiadong Guo, Zhang-Hua Fu, Shutao Xia7228-7235
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To exhibit the generalization of KR, we evaluate our method in both fields of computer vision and natural language processing. Our empirical results show consistent performance gains under all experimental settings. We perform extensive tests on two general classification tasks: the image classification and the text recognition tasks. |
| Researcher Affiliation | Academia | 1 Tsinghua University, 2 PCL Research Center of Networks and Communications, Peng Cheng Laboratory, 3 The Chinese University of Hong Kong, 4 The Hong Kong University of Science and Technology, 5 International Digital Economy Academy, 6 The Chinese University of Hong Kong, Shenzhen, 7 Shenzhen Institute of Artificial Intelligence and Robotics for Society |
| Pseudocode | Yes | Algorithm 1 Workflow of Knowledge Refinery 1: S 0 Initialize the residual correlation matrix with zero matrix. 2: for epoch = 1, 2, . . . do 3: z = fθ(x) Here we omit mini-batch for simplicity. 4: p = softmax(z) 5: Lhard = LCE(p, q) Calculate the hard loss. 6: z(res) = (z)k k is the index of the ground-truth class. 7: p(res) = softmax(z(res)) 8: q(res) = softmax(Sk) 9: Lupd = DKL(q(res)||p(res)) Calculate the update loss. 10: Lres = DKL(p(res)||q(res)) Calculate the residual loss. 11: θ θ β Lhard + αLres Optimize the backbone network. 12: S S β αLupd Optimize the residual correlation matrix. 13: end for |
| Open Source Code | No | No, the paper does not include an unambiguous statement that the authors are releasing their code for the work described, nor does it provide a direct link to a source-code repository. |
| Open Datasets | Yes | Datasets. We use three benchmark datasets of image recognition in our evaluations: (i) CIFAR-10 (Krizhevsky and Hinton 2009):... (ii) CIFAR-100 (Krizhevsky and Hinton 2009):... (iii) Image Net-12 (Russakovsky et al. 2015):... Datasets. We use three benchmark datasets of text classification in our evaluations: (i) AGNews (Del Corso, Gulli, and Romani 2005):... (ii) Yahoo! Answers1:... (iii) Yelp Review Full2:... |
| Dataset Splits | Yes | For CIFAR-10 and CIFAR-100 datasets, we use Res Net-18 (He et al. 2016b) with pre-activation and Wide Res Net-28-10 (Zagoruyko and Komodakis 2016) as backbone network architectures. We use SGD optimizer with Nesterov momentum (Sutskever et al. 2013) and set an initial learning rate to 0.1, momentum to 0.9 and mini-batch size to 128. The learning rate dropped by 0.1 at the 60/120/160th epochs and we train for 200 epochs. The hyper-parameter α in all experiments is an exponential decay function γt, where γ = 0.99. We also adopt data augmentation such as horizontal flips and random crops to samples during the training stage. |
| Hardware Specification | Yes | We implemented all experiments with Pytorch framework on a single NVIDIA Tesla V100 GPU. |
| Software Dependencies | No | No, the paper mentions using 'Pytorch framework' but does not specify its version number or the versions of any other key software dependencies. |
| Experiment Setup | Yes | Settings. For CIFAR-10 and CIFAR-100 datasets, we use Res Net-18 (He et al. 2016b) with pre-activation and Wide Res Net-28-10 (Zagoruyko and Komodakis 2016) as backbone network architectures. We use SGD optimizer with Nesterov momentum (Sutskever et al. 2013) and set an initial learning rate to 0.1, momentum to 0.9 and mini-batch size to 128. The learning rate dropped by 0.1 at the 60/120/160th epochs and we train for 200 epochs. The hyper-parameter α in all experiments is an exponential decay function γt, where γ = 0.99. We also adopt data augmentation such as horizontal flips and random crops to samples during the training stage. For fairly comparison, we follow the experimental settings of (Zhang et al. 2018), which set the mini-batch size to 64, and drop the learning rate by 0.1 every 60 epochs. |