Character-level Convolutional Networks for Text Classification
Authors: Xiang Zhang, Junbo Zhao, Yann LeCun
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This article offers an empirical exploration on the use of character-level convolutional networks (Conv Nets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based Conv Nets and recurrent neural networks. |
| Researcher Affiliation | Academia | Xiang Zhang Junbo Zhao Yann Le Cun Courant Institute of Mathematical Sciences, New York University 719 Broadway, 12th Floor, New York, NY 10003 {xiang, junbo.zhao, yann}@cs.nyu.edu |
| Pseudocode | No | The paper describes algorithms and modules using mathematical formulas and text, but does not include any structured pseudocode or algorithm blocks (e.g., labeled 'Algorithm' or 'Pseudocode'). |
| Open Source Code | No | The paper does not provide any explicit statement about releasing its source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | AG s news corpus. We obtained the AG s corpus of news article on the web2. It contains 496,835 categorized news articles from more than 2000 news sources. We choose the 4 largest classes from this corpus to construct our dataset, using only the title and description fields. 2http://www.di.unipi.it/ gulli/AG_corpus_of_news_articles.html |
| Dataset Splits | No | Table 3 and the text provide 'Train Samples' and 'Test Samples' for each dataset, but there is no explicit mention of a separate validation set or split percentages for training, validation, and testing. |
| Hardware Specification | Yes | We gratefully acknowledge the support of NVIDIA Corporation with the donation of 2 Tesla K40 GPUs used for this research. We gratefully acknowledge the support of Amazon.com Inc for an AWS in Education Research grant used for this research. |
| Software Dependencies | Yes | The implementation is done using Torch 7 [4]. |
| Experiment Setup | Yes | The algorithm used is stochastic gradient descent (SGD) with a minibatch of size 128, using momentum [26] [30] 0.9 and initial step size 0.01 which is halved every 3 epoches for 10 times. ... We also insert 2 dropout [10] modules in between the 3 fully-connected layers to regularize. They have dropout probability of 0.5. ... We initialize the weights using a Gaussian distribution. The mean and standard deviation used for initializing the large model is (0, 0.02) and small model (0, 0.05). |