Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning
Authors: Sihui Luo, Xinchao Wang, Gongfan Fang, Yao Hu, Dapeng Tao, Mingli Song
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test the proposed approach on a list of benchmarks and demonstrate that the learned student is able to achieve very promising performance, superior to those of the teachers in their specialized tasks. Experimental results on a list of classification datasets demonstrate the learned student outperforms the teachers in their corresponding specialities. |
| Researcher Affiliation | Collaboration | Sihui Luo1 , Xinchao Wang2 , Gongfan Fang1 , Yao Hu3 , Dapeng Tao4 and Mingli Song1 1Zhejiang University 2Stevens Institute of Technology 3Alibaba Group 4Yunnan University |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | We test the proposed method on a list of classification datasets summarized in Tab. 1. Given a dataset, we pre-trained the teacher network against the one-hot image-level labels in advance over the dataset using the cross-entropy loss. In face recognition case, we employ CASIA webface [Yi et al., 2014] or MS-Celeb-1M as the training data |
| Dataset Splits | Yes | During training, we explore face verification datasets including LFW [Huang et al., 2008], CFP-FP [Sengupta et al., 2016], and Age DB-30 [Moschoglou et al., 2017] as the validation set. |
| Hardware Specification | Yes | We implement our method using Py Torch [He et al., 2016] on a Quadro P5000 16G GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify a version number, which is required for reproducible software dependencies. |
| Experiment Setup | Yes | An Adam [Kingma and Ba, 2014] optimizer is utilized to train the student network. The learning rate is e 4, while the batch size is 128 on classification datasets and 64 on face recognition ones. |