Adversarial Learning of Portable Student Networks

Authors: Yunhe Wang, Chang Xu, Chao Xu, Dacheng Tao

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on benchmark datasets demonstrate that the proposed method is capable of learning well-performed portable networks, which is superior to the state-of-the-art methods. In this section, we implement experiments to validate the effectiveness of the proposed portable networks learning method on three benchmark datasets, including MNIST, CIFAR-10, and CIFAR-100.
Researcher Affiliation Academia 1Key Laboratory of Machine Perception (MOE), School of EECS, Peking University, China 2UBTECH Sydney AI Centre, SIT, FEIT, University of Sydney, Australia 3Cooperative Medianet Innovation Center, Peking University, China
Pseudocode Yes Algorithm 1 Learning portable DNNs by exploiting GAN.
Open Source Code No The paper does not provide any statement about making its source code publicly available or provide a link to a code repository.
Open Datasets Yes The MNIST dataset is a widely used dataset for conducting visual classification task with deep neural network... The CIFAR-10 dataset... The CIFAR-100 dataset...
Dataset Splits Yes In addition, hyper-parameters of the proposed methods in the following experiments were selected by minimizing the error on a validation set consisting of the last 10,000 training images, and optimal parameters were determined by the top performance on this set... Moreover, the last 10, 000 training images were selected as the validation set which was used for tuning the hyper-parameters of the proposed method.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes λ and τ were equal to 2 and 0.5, respectively... γ was set to be 1.5 × 10−1... The learning rate η was set to be 0.01. The teacher network was trained using conventional stochastic gradient descent (SGD) with learning rate decay and momentum strategies.