Paraphrasing Complex Network: Network Compression via Factor Transfer
Authors: Jangho Kim, Seonguk Park, Nojun Kwak
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With various experiments, we succeeded in training the student network to perform better than the ones with the same architecture trained by the conventional knowledge transfer methods. |
| Researcher Affiliation | Academia | Jangho Kim Seoul National University Seoul, Korea kjh91@snu.ac.kr Seong Uk Park Seoul National University Seoul, Korea swpark0703@snu.ac.kr Nojun Kwak Seoul National University Seoul, Korea nojunk@snu.ac.kr |
| Pseudocode | No | The paper describes the proposed method in text and mathematical equations, but does not provide any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or a link to a source code repository for the described methodology. |
| Open Datasets | Yes | First, we verify the effectiveness of FT through the experiments with CIFAR-10 [14] and CIFAR-100 [15] datasets... Then, we evaluate our method on Image Net LSVRC 2015 [23] dataset. Finally, we applied our method to object detection with PASCAL VOC 2007 [5] dataset. |
| Dataset Splits | Yes | The CIFAR-10 dataset consists of 50K training images and 10K testing images with 10 classes. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU/GPU models) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'our implementation' but does not provide specific version numbers for software dependencies or libraries used. |
| Experiment Setup | Yes | For KD, we fix the temperature for softened softmax to 4 as in [10], and for β of AT, we set it to 103 following [30]. In the whole experiments, AT used multiple group losses. Alike AT, β of FT is set to 103 in Image Net and PASCAL VOC 2007. However, we set it to 5 102 in CIFAR-10 and CIFAR-100 because a large β hinders the convergence. We conduct experiments for different k values from 0.5 to 4. |