Variational Autoencoder for Semi-Supervised Text Classification

Authors: Weidi Xu, Haoze Sun, Chao Deng, Ying Tan

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on Large Movie Review Dataset (IMDB) and AG s News corpus show that the proposed approach significantly improves the classification accuracy compared with pure-supervised classifiers, and achieves competitive performance against previous advanced methods.
Researcher Affiliation Academia Weidi Xu, Haoze Sun, Chao Deng, Ying Tan Key Laboratory of Machine Perception (Ministry of Education), School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China wead hsu@pku.edu.cn, pkucissun@foxmail.com, cdspace678@pku.edu.cn, ytan@pku.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It provides mathematical equations for specific components like CLSTM-II, but not a full algorithm.
Open Source Code No The paper does not provide an explicit statement or a link indicating that the source code for their methodology is publicly available.
Open Datasets Yes Large Movie Review Dataset (IMDB) (Maas et al. 2011) and AG s News corpus (Zhang, Zhao, and Le Cun 2015).
Dataset Splits Yes In both datasets we split 20% samples from train set as valid set.
Hardware Specification Yes Table 5: Time cost of training 1 epoch using different optimization methods on Nvidia GTX Titan-X GPU.
Software Dependencies No The system was implemented using Theano (Bastien et al. 2012; Bergstra et al. 2010) and Lasagne (Dieleman et al. 2015). Specific version numbers for these software components are not provided.
Experiment Setup Yes The models were trained end-to-end using the ADAM (Kingma and Ba 2015) optimizer with learning rate of 4e-3. The cost annealing trick (Bowman et al. 2016; Kaae Sønderby et al. 2016) was adopted to smooth the training by gradually increasing the weight of KL cost from zero to one. Word dropout (Bowman et al. 2016) technique is also utilized and the rate was scaled from 0.25 to 0.5 in our experiments. Hyper-parameter α was scaled from 1 to 2. We apply both dropout (Srivastava et al. 2014) and batch normalization (Ioffe and Szegedy 2015) to the output of the word embedding projection layer and to the feature vectors that serve as the inputs and outputs to the MLP that precedes the final layer. In all the experiments, we used 512 units for memory cells, 300 units for the input embedding projection layer and 50 units for latent variable z.