Adversarial Training Methods for Semi-Supervised Text Classification
Authors: Takeru Miyato, Andrew M. Dai, Ian Goodfellow
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed method achieves state of the art results on multiple benchmark semi-supervised and purely supervised tasks. We provide visualizations and analysis showing that the learned word embeddings have improved in quality and that while training, the model is less prone to overfitting. |
| Researcher Affiliation | Collaboration | 1 Preferred Networks, Inc., ATR Cognitive Mechanisms Laboratories, Kyoto University 2 Google Brain 3 Open AI |
| Pseudocode | No | The paper describes the methods textually and mathematically but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code will be available at https: //github.com/tensorflow/models/tree/master/adversarial_text. |
| Open Datasets | Yes | IMDB (Maas et al., 2011)1 is a standard benchmark movie review dataset for sentiment classification. Elec (Johnson & Zhang, 2015b)2 3 is an Amazon electronic product review dataset. Rotten Tomatoes (Pang & Lee, 2005) consists of short snippets of movie reviews, for sentiment classification. DBpedia (Lehmann et al., 2015; Zhang et al., 2015) is a dataset of Wikipedia pages for category classification. RCV1 (Lewis et al., 2004) consists of news articles from the Reuters Corpus. |
| Dataset Splits | Yes | For each dataset, we divided the original training set into training set and validation set, and we roughly optimized some hyperparameters shared with all of the methods; (model architecture, batchsize, training steps) with the validation performance of the base model with embedding dropout. |
| Hardware Specification | No | The paper states "All experiments used Tensor Flow (Abadi et al., 2016) on GPUs" but does not specify any particular GPU model, CPU, or other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions "All experiments used Tensor Flow (Abadi et al., 2016)" but does not provide specific version numbers for TensorFlow or any other software libraries. |
| Experiment Setup | Yes | We used a unidirectional single-layer LSTM with 1024 hidden units. The word embedding dimension D was 256 on IMDB and 512 on the other datasets. For the optimization, we used the Adam optimizer (Kingma & Ba, 2015), with batch size 256, an initial learning rate of 0.001, and a 0.9999 learning rate exponential decay factor at each training step. We trained for 100,000 steps. We applied gradient clipping with norm set to 1.0 on all the parameters except word embeddings. For regularization of the recurrent language model, we applied dropout (Srivastava et al., 2014) on the word embedding layer with 0.5 dropout rate. |