Do Topological Characteristics Help in Knowledge Distillation?
Authors: Jungeun Kim, Junwon You, Dongjin Lee, Ha Young Kim, Jae-Hun Jung
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate Top KD, we conduct extensive experiments on image classification with the CIFAR-100 and Image Net-1K datasets. In addition, we provide ablation studies to explore Top KD, error analyses of approximated PIs, and topological visualization of results. |
| Researcher Affiliation | Academia | 1Department of AI, Yonsei University, Seoul, South Korea 2Department of Mathematics, POSTECH, Pohang, South Korea 3Graduate School of AI, POSTECH, Pohang, South Korea 4Graduate School of Information, Yonsei University, Seoul, South Korea. |
| Pseudocode | Yes | Algorithm 1 Rips Net training algorithm Algorithm 2 Student model training algorithm |
| Open Source Code | Yes | Code is available at https://github.com/jekim5418/Top KD |
| Open Datasets | Yes | CIFAR-100 (Krizhevsky et al.) is a 32 × 32 pixel color image dataset, comprising 50K training and 10K test images, for a total of 60K images. It consists of 100 classes, each with 600 images, grouped into 20 superclasses, with each image annotated for a specific class and the corresponding superclass. Image Net-1K (Deng et al., 2009) is a large-scale image dataset consisting of 1K categories, 1.28M training images, and 50K validation images. |
| Dataset Splits | Yes | CIFAR-100 (Krizhevsky et al.) is a 32 × 32 pixel color image dataset, comprising 50K training and 10K test images, for a total of 60K images. Image Net-1K (Deng et al., 2009) is a large-scale image dataset consisting of 1K categories, 1.28M training images, and 50K validation images. |
| Hardware Specification | Yes | The training time of Rips Net on an A100 GPU. An execution time to produce one PI on AMD Epyc 7742. |
| Software Dependencies | No | The paper mentions 'Gudhi library (Maria et al., 2014)' and 'Adamax optimizer (Kingma & Ba, 2014)' but does not provide specific version numbers for these or other software components like PyTorch or Python, which are typically used for deep learning. |
| Experiment Setup | Yes | The student networks were trained with the stochastic gradient descent optimizer with a minibatch size of 64 over 240 epochs, and the weight decay was set to 5e-4 with a momentum of 0.9. For Mobile Net (Sandler et al., 2018) and Shuffle Net (Ma et al., 2018), the learning rate was set to 0.01, and for the remaining models, it was set to 0.05, with a decay by a factor of 10 at 150, 180, and 210 epochs. The temperature was set to 4... We set α to 1 and performed a grid search on β (ranging from 1 to 10) and γ (ranging from 1 to 50) in Eq. (5). |