Knowledge Extraction with No Observable Data
Authors: Jaemin Yoo, Minyong Cho, Taebum Kim, U Kang
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that KEGNET outperforms all baselines for data-free knowledge distillation. |
| Researcher Affiliation | Academia | Jaemin Yoo Seoul National University jaeminyoo@snu.ac.kr Minyong Cho Seoul National University chominyong@gmail.com Taebum Kim Seoul National University k.taebum@snu.ac.kr U Kang Seoul National University ukang@snu.ac.kr |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide the source code of our paper in https://github.com/snudatalab/Keg Net. |
| Open Datasets | Yes | We evaluate KEGNET on two kinds of networks and datasets: multilayer perceptrons on unstructured datasets from the UCI Machine Learning Repository2, and convolutional neural networks on MNIST [21], Fashion MNIST [33], and SVHN [25]. |
| Dataset Splits | Yes | We divide each dataset into training, validation, and test sets with the 7:1:2 ratios if the explicit training and test sets are not given. Otherwise, we divide the given training data into new training and validation sets. |
| Hardware Specification | No | The paper describes the software models, datasets, and experimental setup but does not mention specific hardware like GPUs or CPUs used for training or inference. |
| Software Dependencies | No | The paper mentions software components and techniques like 'ELU activation' and 'batch normalization' but does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers. |
| Experiment Setup | Yes | We use a multilayer perceptron (MLP) as a classifier M, which has been used in [27] and contains ten hidden layers with the ELU activation function and dropout [32] of probability 0.15. We create student networks by applying Tucker decomposition to all dense layers: the target rank is 5 in Shuttle and 10 in the others. We use an MLP as a generator G of two hidden layers with the ELU activation and batch normalization. We also apply the non-learnable batch normalization after the output layer to restrict the output space to the standard normal distribution: the parameters γ and β [10] are fixed as 0 and 1, respectively. In each setting, we train five generators with different random seeds as G and combine the generated data from all generators. We also train five student networks and report the average and standard deviation of classification accuracy for quantitative evaluation. We also use the hidden variable ˆz of length 10 in all settings, which is much smaller than the data vectors. We use a decoder network of the same structure in all settings: a multilayer perceptron of n hidden layers with the ELU activation [5] and batch normalization. n is chosen by the data complexity: n = 1 in MNIST, n = 2 in the unstructured datasets, and n = 3 in Fashion MNIST and SVHN. We set ˆpy to the categorical distribution that produces one-hot vectors as ˆy, and pz to the multivariate Gaussian distribution that produces standard normal vectors. |