Comprehensive Knowledge Distillation with Causal Intervention

Authors: Xiang Deng, Zhongfei Zhang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on several benchmark datasets demonstrate that CID outperforms the state-of-the-art approaches significantly in terms of generalization and transferability.
Researcher Affiliation Academia Xiang Deng Computer Science Department State University of New York at Binghamton xdeng7@binghamton.edu Zhongfei Zhang Computer Science Department State University of New York at Binghamton zhongfei@cs.binghamton.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes 1Code: https://github.com/Xiang-Deng-DL/CID
Open Datasets Yes We compare CID with SOTA approaches across varieties of (a) benchmark datasets (i.e., CIFAR-10 [24], CIFAR-100 [24], Tiny Image Net 2, and Image Net [11])
Dataset Splits No The paper states that training details, including data splits, are specified in the Appendix, which is not provided. The main text refers to using standard benchmark datasets but does not explicitly detail the training/validation/test split percentages or sample counts.
Hardware Specification No The paper defers information about the total amount of compute and type of resources used to the Appendix, which is not provided in the main text. No specific hardware details (e.g., GPU models, CPU types) are mentioned in the main body of the paper.
Software Dependencies No The paper does not provide specific software dependencies or their version numbers in the main text. Training details, which might include software specifics, are referred to the Appendix, which is not available.
Experiment Setup No The paper describes the datasets and network architectures used (e.g., WRN-40-2 as teacher, WRN-16-2 as student) and general settings like adding KD loss for fair comparison. However, specific hyperparameters such as learning rate, batch size, number of epochs, or optimizer details are not explicitly stated in the main text and are referred to the Appendix.