Character-level Convolutional Networks for Text Classification

Authors: Xiang Zhang, Junbo Zhao, Yann LeCun

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This article offers an empirical exploration on the use of character-level convolutional networks (Conv Nets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based Conv Nets and recurrent neural networks.
Researcher Affiliation Academia Xiang Zhang Junbo Zhao Yann Le Cun Courant Institute of Mathematical Sciences, New York University 719 Broadway, 12th Floor, New York, NY 10003 {xiang, junbo.zhao, yann}@cs.nyu.edu
Pseudocode No The paper describes algorithms and modules using mathematical formulas and text, but does not include any structured pseudocode or algorithm blocks (e.g., labeled 'Algorithm' or 'Pseudocode').
Open Source Code No The paper does not provide any explicit statement about releasing its source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets Yes AG s news corpus. We obtained the AG s corpus of news article on the web2. It contains 496,835 categorized news articles from more than 2000 news sources. We choose the 4 largest classes from this corpus to construct our dataset, using only the title and description fields. 2http://www.di.unipi.it/ gulli/AG_corpus_of_news_articles.html
Dataset Splits No Table 3 and the text provide 'Train Samples' and 'Test Samples' for each dataset, but there is no explicit mention of a separate validation set or split percentages for training, validation, and testing.
Hardware Specification Yes We gratefully acknowledge the support of NVIDIA Corporation with the donation of 2 Tesla K40 GPUs used for this research. We gratefully acknowledge the support of Amazon.com Inc for an AWS in Education Research grant used for this research.
Software Dependencies Yes The implementation is done using Torch 7 [4].
Experiment Setup Yes The algorithm used is stochastic gradient descent (SGD) with a minibatch of size 128, using momentum [26] [30] 0.9 and initial step size 0.01 which is halved every 3 epoches for 10 times. ... We also insert 2 dropout [10] modules in between the 3 fully-connected layers to regularize. They have dropout probability of 0.5. ... We initialize the weights using a Gaussian distribution. The mean and standard deviation used for initializing the large model is (0, 0.02) and small model (0, 0.05).