Knowledge Distillation with Auxiliary Variable
Authors: Bo Peng, Zhen Fang, Guangquan Zhang, Jie Lu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Experiments Baselines. We compare our method with mainstream knowledge distillers, including, KD (Hinton et al., 2015), DKD (Yeh et al., 2022), IPWD (Niu et al., 2022), WSLD (Zhou et al., 2021), CS-KD (Yun et al., 2020), TF-KD (Yuan et al., 2020), PS-KD (Kim et al., 2021), NKD (Yang et al., 2023), MLD (Jin et al., 2023), DIST (Huang et al., 2022a), Fit Nets (Romero et al., 2014), CRD (Tian et al., 2019), WCo RD (Chen et al., 2021a), Review KD (Chen et al., 2021b), NORM (Liu et al., 2023), Co Co RD (Fu et al., 2023), Diff KD (Huang et al., 2023), SRRL (Yang et al., 2021) and SSKD (Xu et al., 2020). Settings. We conduct experiments on multiple benchmarks for knowledge transfer: CIFAR-100 (Krizhevsky et al., 2009), Image Net-1K (Russakovsky et al., 2015), STL-10 (Coates et al., 2011), Tiny-Image Net (Chrabaszcz et al., 2017), PASCAL-VOC (Everingham et al., 2009) and MSCOCO (Lin et al., 2014). |
| Researcher Affiliation | Academia | 1Faculty of Engineering & Information Technology, University of Technology Sydney, Sydney, Australia. |
| Pseudocode | Yes | Algorithm 1 knowledge distillation with auxiliary variable |
| Open Source Code | No | The paper does not provide an explicit statement or a link to open-source code for the described methodology. |
| Open Datasets | Yes | Settings. We conduct experiments on multiple benchmarks for knowledge transfer: CIFAR-100 (Krizhevsky et al., 2009), Image Net-1K (Russakovsky et al., 2015), STL-10 (Coates et al., 2011), Tiny-Image Net (Chrabaszcz et al., 2017), PASCAL-VOC (Everingham et al., 2009) and MSCOCO (Lin et al., 2014). |
| Dataset Splits | No | The paper mentions using a 'validation set' for Image Net-1K, as seen in 'Table 4: Top-1 and Top-5 accuracy (%) on Image Net-1K validation set.', but it does not specify the explicit train/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run its experiments. The 'Implementation Details' section focuses on software parameters and training settings. |
| Software Dependencies | No | The paper mentions 'Py Torch Image Net practice' but does not specify exact version numbers for PyTorch or any other software dependencies needed for reproduction. |
| Experiment Setup | Yes | We set the batch size as 64 and the initial learning rate as 0.01 (for Shuffle Net and Mobile Net-V2) or 0.05 (for the other series). We train the model for 240 epochs, in which the learning rate is decayed by 10 every 30 epochs after 150 epochs. We use SGD as the optimizer with weight decay 5e 4 and momentum 0.9. |