Knowledge Extraction with No Observable Data

Authors: Jaemin Yoo, Minyong Cho, Taebum Kim, U Kang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that KEGNET outperforms all baselines for data-free knowledge distillation.
Researcher Affiliation Academia Jaemin Yoo Seoul National University jaeminyoo@snu.ac.kr Minyong Cho Seoul National University chominyong@gmail.com Taebum Kim Seoul National University k.taebum@snu.ac.kr U Kang Seoul National University ukang@snu.ac.kr
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes We provide the source code of our paper in https://github.com/snudatalab/Keg Net.
Open Datasets Yes We evaluate KEGNET on two kinds of networks and datasets: multilayer perceptrons on unstructured datasets from the UCI Machine Learning Repository2, and convolutional neural networks on MNIST [21], Fashion MNIST [33], and SVHN [25].
Dataset Splits Yes We divide each dataset into training, validation, and test sets with the 7:1:2 ratios if the explicit training and test sets are not given. Otherwise, we divide the given training data into new training and validation sets.
Hardware Specification No The paper describes the software models, datasets, and experimental setup but does not mention specific hardware like GPUs or CPUs used for training or inference.
Software Dependencies No The paper mentions software components and techniques like 'ELU activation' and 'batch normalization' but does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers.
Experiment Setup Yes We use a multilayer perceptron (MLP) as a classifier M, which has been used in [27] and contains ten hidden layers with the ELU activation function and dropout [32] of probability 0.15. We create student networks by applying Tucker decomposition to all dense layers: the target rank is 5 in Shuttle and 10 in the others. We use an MLP as a generator G of two hidden layers with the ELU activation and batch normalization. We also apply the non-learnable batch normalization after the output layer to restrict the output space to the standard normal distribution: the parameters γ and β [10] are fixed as 0 and 1, respectively. In each setting, we train five generators with different random seeds as G and combine the generated data from all generators. We also train five student networks and report the average and standard deviation of classification accuracy for quantitative evaluation. We also use the hidden variable ˆz of length 10 in all settings, which is much smaller than the data vectors. We use a decoder network of the same structure in all settings: a multilayer perceptron of n hidden layers with the ELU activation [5] and batch normalization. n is chosen by the data complexity: n = 1 in MNIST, n = 2 in the unstructured datasets, and n = 3 in Fashion MNIST and SVHN. We set ˆpy to the categorical distribution that produces one-hot vectors as ˆy, and pz to the multivariate Gaussian distribution that produces standard normal vectors.