Comprehensive Knowledge Distillation with Causal Intervention
Authors: Xiang Deng, Zhongfei Zhang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on several benchmark datasets demonstrate that CID outperforms the state-of-the-art approaches significantly in terms of generalization and transferability. |
| Researcher Affiliation | Academia | Xiang Deng Computer Science Department State University of New York at Binghamton xdeng7@binghamton.edu Zhongfei Zhang Computer Science Department State University of New York at Binghamton zhongfei@cs.binghamton.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code: https://github.com/Xiang-Deng-DL/CID |
| Open Datasets | Yes | We compare CID with SOTA approaches across varieties of (a) benchmark datasets (i.e., CIFAR-10 [24], CIFAR-100 [24], Tiny Image Net 2, and Image Net [11]) |
| Dataset Splits | No | The paper states that training details, including data splits, are specified in the Appendix, which is not provided. The main text refers to using standard benchmark datasets but does not explicitly detail the training/validation/test split percentages or sample counts. |
| Hardware Specification | No | The paper defers information about the total amount of compute and type of resources used to the Appendix, which is not provided in the main text. No specific hardware details (e.g., GPU models, CPU types) are mentioned in the main body of the paper. |
| Software Dependencies | No | The paper does not provide specific software dependencies or their version numbers in the main text. Training details, which might include software specifics, are referred to the Appendix, which is not available. |
| Experiment Setup | No | The paper describes the datasets and network architectures used (e.g., WRN-40-2 as teacher, WRN-16-2 as student) and general settings like adding KD loss for fair comparison. However, specific hyperparameters such as learning rate, batch size, number of epochs, or optimizer details are not explicitly stated in the main text and are referred to the Appendix. |