Variational Autoencoder for Semi-Supervised Text Classification
Authors: Weidi Xu, Haoze Sun, Chao Deng, Ying Tan
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on Large Movie Review Dataset (IMDB) and AG s News corpus show that the proposed approach significantly improves the classification accuracy compared with pure-supervised classifiers, and achieves competitive performance against previous advanced methods. |
| Researcher Affiliation | Academia | Weidi Xu, Haoze Sun, Chao Deng, Ying Tan Key Laboratory of Machine Perception (Ministry of Education), School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China wead hsu@pku.edu.cn, pkucissun@foxmail.com, cdspace678@pku.edu.cn, ytan@pku.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It provides mathematical equations for specific components like CLSTM-II, but not a full algorithm. |
| Open Source Code | No | The paper does not provide an explicit statement or a link indicating that the source code for their methodology is publicly available. |
| Open Datasets | Yes | Large Movie Review Dataset (IMDB) (Maas et al. 2011) and AG s News corpus (Zhang, Zhao, and Le Cun 2015). |
| Dataset Splits | Yes | In both datasets we split 20% samples from train set as valid set. |
| Hardware Specification | Yes | Table 5: Time cost of training 1 epoch using different optimization methods on Nvidia GTX Titan-X GPU. |
| Software Dependencies | No | The system was implemented using Theano (Bastien et al. 2012; Bergstra et al. 2010) and Lasagne (Dieleman et al. 2015). Specific version numbers for these software components are not provided. |
| Experiment Setup | Yes | The models were trained end-to-end using the ADAM (Kingma and Ba 2015) optimizer with learning rate of 4e-3. The cost annealing trick (Bowman et al. 2016; Kaae Sønderby et al. 2016) was adopted to smooth the training by gradually increasing the weight of KL cost from zero to one. Word dropout (Bowman et al. 2016) technique is also utilized and the rate was scaled from 0.25 to 0.5 in our experiments. Hyper-parameter α was scaled from 1 to 2. We apply both dropout (Srivastava et al. 2014) and batch normalization (Ioffe and Szegedy 2015) to the output of the word embedding projection layer and to the feature vectors that serve as the inputs and outputs to the MLP that precedes the final layer. In all the experiments, we used 512 units for memory cells, 300 units for the input embedding projection layer and 50 units for latent variable z. |