Dataless Text Classification with Descriptive LDA

Authors: Xingyuan Chen, Yunqing Xia, Peng Jin, John Carroll

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results with the 20Newsgroups and RCV1 datasets show that: (1) our DLTC method is more effective than the semantic-based DLTC baseline method; and (2) the accuracy of our DLTC method is very close to state-of-the-art supervised text classification methods.
Researcher Affiliation Academia 1School of Computer Science, Leshan Normal University, Leshan 614000, China cxyforpaper@gmail.com, jandp@pku.edu.cn 2Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China yqxia@tsinghua.edu.cn 3Department of Informatics, University of Sussex, Brighton BN1 9QJ, UK j.a.carroll@sussex.ac.uk
Pseudocode Yes Algorithm 1 presents this more formally.
Open Source Code No The paper mentions external tools used for comparison ('LIBSVM' and 'the implementation of Wang, Blei, and Li (2009)'), providing links to their repositories, but does not provide access to the authors' own source code for the Desc LDA method described in the paper.
Open Datasets Yes We use two datasets: 20Newsgroups (20NG): Introduced by Lang (1995)... RCV1: An archive of multi-labeled newswire stories (Lewis et al. 2004).
Dataset Splits Yes 20Newsgroups (20NG): ... The dataset is divided into training (60%) and test (40%) sets. RCV1: ... 13,625 stories are used as the training set and 6,188 stories as the test set. In our experiments we use the standard training/test partitions of the two datasets.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud computing specifications used for running the experiments.
Software Dependencies No The paper mentions using 'linear SVM using the package LIBSVM' and 'adopt the implementation of Wang, Blei, and Li (2009)' for sLDA, but it does not specify version numbers for these software components or any other libraries/dependencies.
Experiment Setup Yes For our Desc LDA method, we set α = 0.1 and η = 0.2. We vary K (the number of topics) across the range used in previous work (Blei and Mc Auliffe 2007). For the number of iterations, in preliminary experiments we observed good accuracy at 30.