reproducibilityindex.ai

Distributional Smoothing with Virtual Adversarial Training

Authors: Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, Shin Ishii

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	When we applied our technique to supervised and semi-supervised learning for the MNIST dataset, it outperformed all the training methods other than the current state of the art method, which is based on a highly advanced generative model. We also applied our method to SVHN and NORB, and conﬁrmed our method s superior performance over the current state of the art semi-supervised method applied to these datasets.
Researcher Affiliation	Academia	Takeru Miyato1, Shin-ichi Maeda1, Masanori Koyama1, Ken Nakae1 & Shin Ishii2 Graduate School of Informatics Kyoto University Yoshidahonmachi 36-1, Sakyo, Kyoto, Japan 1{miyato-t,ichi,koyama-m,nakae-k}@sys.i.kyoto-u.ac.jp 2ishii@i.kyoto-u.ac.jp
Pseudocode	Yes	Algorithm 1 Generation of r(n) v-adv
Open Source Code	Yes	Reproducing code is uploaded on https://github.com/takerum/vat.
Open Datasets	Yes	When we applied our technique to supervised and semi-supervised learning for the MNIST dataset... We also applied our method to SVHN and NORB... The SVHN dataset consists of 32 32 3 pixel RGB images of housing numbers and their corresponding labels (0-9)... The NORB dataset consists of 2 96 96 pixel gray images of 50 different objects and their corresponding labels (cars, trucks, planes, animals, humans).
Dataset Splits	Yes	We split the original 60,000 training samples into 50,000 training samples and 10,000 validation samples, and used the latter of which to tune the hyperparameters. We used the validation set of ﬁxed size 1000, and used all the training samples excluding the validation set and the labeled to train the NNs. We reserved 1000 dataset for validation.
Hardware Specification	No	The paper mentions 'All the computations were conducted with Theano' but does not specify any particular hardware (CPU, GPU, etc.) used for these computations.
Software Dependencies	No	The paper states 'All the computations were conducted with Theano' and 'The training was conducted by mini-batch SGD based on ADAM', but it does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	Throughout the experiments on our proposed method, we used a ﬁxed value of λ = 1, and we also used a ﬁxed value of Ip = 1. Our classiﬁer was a neural network (NN) with one hidden layer consisting of 100 hidden units. We used Re LU (Jarrett et al., 2009; Nair & Hinton, 2010; Glorot et al., 2011) activation function for hidden units, and used softmax activation function for all the output units. We used µi = 0.9, and exponentially decreasing γi with rate 0.995. As for the choice of γ1, we used 1.0. We trained the NNs with 1,000 parameter updates. We chose the mini-batch size of 100, and used the default values of Kingma & Ba (2015) for the tunable parameters of ADAM. We selected the initial value of 0.002 and adopted the schedule of exponential decay with rate 0.9 per 500 updates. As for the architecture of NNs, we used Re LU based NNs with two hidden layers with the number of hidden units (1200, 1200). We used two separate minibatches at each step: one minibatch of size 100 from labeled samples for the computation of the likelihood term, and another minibatch of size 250 from both labeled and unlabeled samples for computing the regularization term. We used neural network with the number of hidden nodes given by (1200, 600, 300, 150, 150).