Paraphrasing Complex Network: Network Compression via Factor Transfer

Authors: Jangho Kim, Seonguk Park, Nojun Kwak

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental With various experiments, we succeeded in training the student network to perform better than the ones with the same architecture trained by the conventional knowledge transfer methods.
Researcher Affiliation Academia Jangho Kim Seoul National University Seoul, Korea kjh91@snu.ac.kr Seong Uk Park Seoul National University Seoul, Korea swpark0703@snu.ac.kr Nojun Kwak Seoul National University Seoul, Korea nojunk@snu.ac.kr
Pseudocode No The paper describes the proposed method in text and mathematical equations, but does not provide any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or a link to a source code repository for the described methodology.
Open Datasets Yes First, we verify the effectiveness of FT through the experiments with CIFAR-10 [14] and CIFAR-100 [15] datasets... Then, we evaluate our method on Image Net LSVRC 2015 [23] dataset. Finally, we applied our method to object detection with PASCAL VOC 2007 [5] dataset.
Dataset Splits Yes The CIFAR-10 dataset consists of 50K training images and 10K testing images with 10 classes.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU/GPU models) used for running the experiments.
Software Dependencies No The paper mentions 'our implementation' but does not provide specific version numbers for software dependencies or libraries used.
Experiment Setup Yes For KD, we fix the temperature for softened softmax to 4 as in [10], and for β of AT, we set it to 103 following [30]. In the whole experiments, AT used multiple group losses. Alike AT, β of FT is set to 103 in Image Net and PASCAL VOC 2007. However, we set it to 5 102 in CIFAR-10 and CIFAR-100 because a large β hinders the convergence. We conduct experiments for different k values from 0.5 to 4.