IMF: Integrating Matched Features Using Attentive Logit in Knowledge Distillation

Authors: Jeongho Kim, Hanbeen Lee, Simon S. Woo

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we demonstrate that IMF consistently outperforms other state-of-the-art methods with a large margin over the various datasets in different tasks without extra computation.
Researcher Affiliation Collaboration Jeongho Kim1 , Hanbeen Lee2 , Simon S. Woo3 1Korea Advanced Institute of Science and Technology (KAIST), S. Korea 2NAVER Z Corporation, S. Korea 3Department of Artificial Intelligence, Sungkyunkwan University, S. Korea
Pseudocode No The paper includes mathematical equations and descriptive text for the method, but no explicit pseudocode or algorithm block.
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes For example, in CIFAR100 [Krizhevsky and Hinton, 2009], which has 100 classes, 100 parameters are added to the output logits of each IFD layer.
Dataset Splits Yes Training details. Backbone architecture and training settings for experiments are similar to the recent research [Tian et al., 2019].
Hardware Specification No The paper mentions parameters and FLOPs for model size comparison but does not specify the hardware (e.g., GPU or CPU models, memory) used for running the experiments.
Software Dependencies No The paper refers to architecture components like 'Depth Conv(3x3) Point Conv(1x1) BN & Re LU' but does not list any specific software dependencies with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup Yes Training details. Backbone architecture and training settings for experiments are similar to the recent research [Tian et al., 2019]. In our method, we conduct a grid search to choose the α and β values in Eq. 5 from {10, 20, 30, 40}. The IFD block has the same structure in all experiments and model architectures. Specifically, we used a block structure of Depth Conv(3 3) Point Conv(1 1) BN & Re LU.