Do Topological Characteristics Help in Knowledge Distillation?

Authors: Jungeun Kim, Junwon You, Dongjin Lee, Ha Young Kim, Jae-Hun Jung

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate Top KD, we conduct extensive experiments on image classification with the CIFAR-100 and Image Net-1K datasets. In addition, we provide ablation studies to explore Top KD, error analyses of approximated PIs, and topological visualization of results.
Researcher Affiliation Academia 1Department of AI, Yonsei University, Seoul, South Korea 2Department of Mathematics, POSTECH, Pohang, South Korea 3Graduate School of AI, POSTECH, Pohang, South Korea 4Graduate School of Information, Yonsei University, Seoul, South Korea.
Pseudocode Yes Algorithm 1 Rips Net training algorithm Algorithm 2 Student model training algorithm
Open Source Code Yes Code is available at https://github.com/jekim5418/Top KD
Open Datasets Yes CIFAR-100 (Krizhevsky et al.) is a 32 × 32 pixel color image dataset, comprising 50K training and 10K test images, for a total of 60K images. It consists of 100 classes, each with 600 images, grouped into 20 superclasses, with each image annotated for a specific class and the corresponding superclass. Image Net-1K (Deng et al., 2009) is a large-scale image dataset consisting of 1K categories, 1.28M training images, and 50K validation images.
Dataset Splits Yes CIFAR-100 (Krizhevsky et al.) is a 32 × 32 pixel color image dataset, comprising 50K training and 10K test images, for a total of 60K images. Image Net-1K (Deng et al., 2009) is a large-scale image dataset consisting of 1K categories, 1.28M training images, and 50K validation images.
Hardware Specification Yes The training time of Rips Net on an A100 GPU. An execution time to produce one PI on AMD Epyc 7742.
Software Dependencies No The paper mentions 'Gudhi library (Maria et al., 2014)' and 'Adamax optimizer (Kingma & Ba, 2014)' but does not provide specific version numbers for these or other software components like PyTorch or Python, which are typically used for deep learning.
Experiment Setup Yes The student networks were trained with the stochastic gradient descent optimizer with a minibatch size of 64 over 240 epochs, and the weight decay was set to 5e-4 with a momentum of 0.9. For Mobile Net (Sandler et al., 2018) and Shuffle Net (Ma et al., 2018), the learning rate was set to 0.01, and for the remaining models, it was set to 0.05, with a decay by a factor of 10 at 150, 180, and 210 epochs. The temperature was set to 4... We set α to 1 and performed a grid search on β (ranging from 1 to 10) and γ (ranging from 1 to 50) in Eq. (5).