reproducibilityindex.ai

Knowledge Extraction with No Observable Data

Authors: Jaemin Yoo, Minyong Cho, Taebum Kim, U Kang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that KEGNET outperforms all baselines for data-free knowledge distillation.
Researcher Affiliation	Academia	Jaemin Yoo Seoul National University jaeminyoo@snu.ac.kr Minyong Cho Seoul National University chominyong@gmail.com Taebum Kim Seoul National University k.taebum@snu.ac.kr U Kang Seoul National University ukang@snu.ac.kr
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We provide the source code of our paper in https://github.com/snudatalab/Keg Net.
Open Datasets	Yes	We evaluate KEGNET on two kinds of networks and datasets: multilayer perceptrons on unstructured datasets from the UCI Machine Learning Repository2, and convolutional neural networks on MNIST [21], Fashion MNIST [33], and SVHN [25].
Dataset Splits	Yes	We divide each dataset into training, validation, and test sets with the 7:1:2 ratios if the explicit training and test sets are not given. Otherwise, we divide the given training data into new training and validation sets.
Hardware Specification	No	The paper describes the software models, datasets, and experimental setup but does not mention specific hardware like GPUs or CPUs used for training or inference.
Software Dependencies	No	The paper mentions software components and techniques like 'ELU activation' and 'batch normalization' but does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers.
Experiment Setup	Yes	We use a multilayer perceptron (MLP) as a classiﬁer M, which has been used in [27] and contains ten hidden layers with the ELU activation function and dropout [32] of probability 0.15. We create student networks by applying Tucker decomposition to all dense layers: the target rank is 5 in Shuttle and 10 in the others. We use an MLP as a generator G of two hidden layers with the ELU activation and batch normalization. We also apply the non-learnable batch normalization after the output layer to restrict the output space to the standard normal distribution: the parameters γ and β [10] are ﬁxed as 0 and 1, respectively. In each setting, we train ﬁve generators with different random seeds as G and combine the generated data from all generators. We also train ﬁve student networks and report the average and standard deviation of classiﬁcation accuracy for quantitative evaluation. We also use the hidden variable ˆz of length 10 in all settings, which is much smaller than the data vectors. We use a decoder network of the same structure in all settings: a multilayer perceptron of n hidden layers with the ELU activation [5] and batch normalization. n is chosen by the data complexity: n = 1 in MNIST, n = 2 in the unstructured datasets, and n = 3 in Fashion MNIST and SVHN. We set ˆpy to the categorical distribution that produces one-hot vectors as ˆy, and pz to the multivariate Gaussian distribution that produces standard normal vectors.