reproducibilityindex.ai

Knowledge Refinery: Learning from Decoupled Label

Authors: Qianggang Ding, Sifan Wu, Tao Dai, Hao Sun, Jiadong Guo, Zhang-Hua Fu, Shutao Xia7228-7235

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To exhibit the generalization of KR, we evaluate our method in both ﬁelds of computer vision and natural language processing. Our empirical results show consistent performance gains under all experimental settings. We perform extensive tests on two general classiﬁcation tasks: the image classiﬁcation and the text recognition tasks.
Researcher Affiliation	Academia	1 Tsinghua University, 2 PCL Research Center of Networks and Communications, Peng Cheng Laboratory, 3 The Chinese University of Hong Kong, 4 The Hong Kong University of Science and Technology, 5 International Digital Economy Academy, 6 The Chinese University of Hong Kong, Shenzhen, 7 Shenzhen Institute of Artiﬁcial Intelligence and Robotics for Society
Pseudocode	Yes	Algorithm 1 Workﬂow of Knowledge Reﬁnery 1: S 0 Initialize the residual correlation matrix with zero matrix. 2: for epoch = 1, 2, . . . do 3: z = fθ(x) Here we omit mini-batch for simplicity. 4: p = softmax(z) 5: Lhard = LCE(p, q) Calculate the hard loss. 6: z(res) = (z)k k is the index of the ground-truth class. 7: p(res) = softmax(z(res)) 8: q(res) = softmax(Sk) 9: Lupd = DKL(q(res)\|\|p(res)) Calculate the update loss. 10: Lres = DKL(p(res)\|\|q(res)) Calculate the residual loss. 11: θ θ β Lhard + αLres Optimize the backbone network. 12: S S β αLupd Optimize the residual correlation matrix. 13: end for
Open Source Code	No	No, the paper does not include an unambiguous statement that the authors are releasing their code for the work described, nor does it provide a direct link to a source-code repository.
Open Datasets	Yes	Datasets. We use three benchmark datasets of image recognition in our evaluations: (i) CIFAR-10 (Krizhevsky and Hinton 2009):... (ii) CIFAR-100 (Krizhevsky and Hinton 2009):... (iii) Image Net-12 (Russakovsky et al. 2015):... Datasets. We use three benchmark datasets of text classiﬁcation in our evaluations: (i) AGNews (Del Corso, Gulli, and Romani 2005):... (ii) Yahoo! Answers1:... (iii) Yelp Review Full2:...
Dataset Splits	Yes	For CIFAR-10 and CIFAR-100 datasets, we use Res Net-18 (He et al. 2016b) with pre-activation and Wide Res Net-28-10 (Zagoruyko and Komodakis 2016) as backbone network architectures. We use SGD optimizer with Nesterov momentum (Sutskever et al. 2013) and set an initial learning rate to 0.1, momentum to 0.9 and mini-batch size to 128. The learning rate dropped by 0.1 at the 60/120/160th epochs and we train for 200 epochs. The hyper-parameter α in all experiments is an exponential decay function γt, where γ = 0.99. We also adopt data augmentation such as horizontal ﬂips and random crops to samples during the training stage.
Hardware Specification	Yes	We implemented all experiments with Pytorch framework on a single NVIDIA Tesla V100 GPU.
Software Dependencies	No	No, the paper mentions using 'Pytorch framework' but does not specify its version number or the versions of any other key software dependencies.
Experiment Setup	Yes	Settings. For CIFAR-10 and CIFAR-100 datasets, we use Res Net-18 (He et al. 2016b) with pre-activation and Wide Res Net-28-10 (Zagoruyko and Komodakis 2016) as backbone network architectures. We use SGD optimizer with Nesterov momentum (Sutskever et al. 2013) and set an initial learning rate to 0.1, momentum to 0.9 and mini-batch size to 128. The learning rate dropped by 0.1 at the 60/120/160th epochs and we train for 200 epochs. The hyper-parameter α in all experiments is an exponential decay function γt, where γ = 0.99. We also adopt data augmentation such as horizontal ﬂips and random crops to samples during the training stage. For fairly comparison, we follow the experimental settings of (Zhang et al. 2018), which set the mini-batch size to 64, and drop the learning rate by 0.1 every 60 epochs.