Distributional Smoothing with Virtual Adversarial Training
Authors: Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, Shin Ishii
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | When we applied our technique to supervised and semi-supervised learning for the MNIST dataset, it outperformed all the training methods other than the current state of the art method, which is based on a highly advanced generative model. We also applied our method to SVHN and NORB, and confirmed our method s superior performance over the current state of the art semi-supervised method applied to these datasets. |
| Researcher Affiliation | Academia | Takeru Miyato1, Shin-ichi Maeda1, Masanori Koyama1, Ken Nakae1 & Shin Ishii2 Graduate School of Informatics Kyoto University Yoshidahonmachi 36-1, Sakyo, Kyoto, Japan 1{miyato-t,ichi,koyama-m,nakae-k}@sys.i.kyoto-u.ac.jp 2ishii@i.kyoto-u.ac.jp |
| Pseudocode | Yes | Algorithm 1 Generation of r(n) v-adv |
| Open Source Code | Yes | Reproducing code is uploaded on https://github.com/takerum/vat. |
| Open Datasets | Yes | When we applied our technique to supervised and semi-supervised learning for the MNIST dataset... We also applied our method to SVHN and NORB... The SVHN dataset consists of 32 32 3 pixel RGB images of housing numbers and their corresponding labels (0-9)... The NORB dataset consists of 2 96 96 pixel gray images of 50 different objects and their corresponding labels (cars, trucks, planes, animals, humans). |
| Dataset Splits | Yes | We split the original 60,000 training samples into 50,000 training samples and 10,000 validation samples, and used the latter of which to tune the hyperparameters. We used the validation set of fixed size 1000, and used all the training samples excluding the validation set and the labeled to train the NNs. We reserved 1000 dataset for validation. |
| Hardware Specification | No | The paper mentions 'All the computations were conducted with Theano' but does not specify any particular hardware (CPU, GPU, etc.) used for these computations. |
| Software Dependencies | No | The paper states 'All the computations were conducted with Theano' and 'The training was conducted by mini-batch SGD based on ADAM', but it does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Throughout the experiments on our proposed method, we used a fixed value of λ = 1, and we also used a fixed value of Ip = 1. Our classifier was a neural network (NN) with one hidden layer consisting of 100 hidden units. We used Re LU (Jarrett et al., 2009; Nair & Hinton, 2010; Glorot et al., 2011) activation function for hidden units, and used softmax activation function for all the output units. We used µi = 0.9, and exponentially decreasing γi with rate 0.995. As for the choice of γ1, we used 1.0. We trained the NNs with 1,000 parameter updates. We chose the mini-batch size of 100, and used the default values of Kingma & Ba (2015) for the tunable parameters of ADAM. We selected the initial value of 0.002 and adopted the schedule of exponential decay with rate 0.9 per 500 updates. As for the architecture of NNs, we used Re LU based NNs with two hidden layers with the number of hidden units (1200, 1200). We used two separate minibatches at each step: one minibatch of size 100 from labeled samples for the computation of the likelihood term, and another minibatch of size 250 from both labeled and unlabeled samples for computing the regularization term. We used neural network with the number of hidden nodes given by (1200, 600, 300, 150, 150). |