Progressively Knowledge Distillation via Re-parameterizing Diffusion Reverse Process

Authors: Xufeng Yao, Fanbin Lu, Yuechen Zhang, Xinyun Zhang, Wenqian Zhao, Bei Yu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present extensive experiments performed on various transfer scenarios, such as CNN-to-CNN and Transformer-to CNN, that validate the effectiveness of our approach.
Researcher Affiliation Academia Xufeng Yao, Fanbin Lu, Yuechen Zhang, Xinyun Zhang, Wenqian Zhao, Bei Yu Department of Computer Science & Engineering, The Chinese University of Hong Kong {xfyao,fblu21,yczhang21,xyzhang21,wqzhao,byu}@cse.cuhk.edu.hk
Pseudocode No The main body of the paper does not contain any clearly labeled pseudocode or algorithm blocks. It mentions that more details are in the appendix, but the appendix content is not provided.
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide a link to a code repository for the described methodology.
Open Datasets Yes We experiment with different settings varying architectures and datasets, including: CIFAR-100 (Krizhevsky, Hinton et al. 2009) which consists 32 32 images with 100 categories. Training and validation sets are composed of 50k and 10k images. Image Net1k (Deng et al. 2009) which contains over 1280k images with 1000 categories. Image Net100 is a subset of Image Net which contains roughly 120k images.
Dataset Splits Yes CIFAR-100 (Krizhevsky, Hinton et al. 2009) which consists 32 32 images with 100 categories. Training and validation sets are composed of 50k and 10k images. Image Net100 is a subset of Image Net which contains roughly 120k images. The training and validation splitting rule is introduced in (Wang and Isola 2020).
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running its experiments.
Software Dependencies No The paper mentions 'Our implementation is mainly based on the DKD... Review... and CRD', implying software dependencies, but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, etc.).
Experiment Setup No The paper states 'Our implementation is mainly based on the DKD (Zhao et al. 2022) Review (Chen et al. 2021b) and CRD (Tian, Krishnan, and Isola 2020) with the default training and testing setting,' but it does not provide specific hyperparameter values, optimizer settings, or detailed training configurations within the main text.