Dataless Text Classification with Descriptive LDA
Authors: Xingyuan Chen, Yunqing Xia, Peng Jin, John Carroll
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results with the 20Newsgroups and RCV1 datasets show that: (1) our DLTC method is more effective than the semantic-based DLTC baseline method; and (2) the accuracy of our DLTC method is very close to state-of-the-art supervised text classification methods. |
| Researcher Affiliation | Academia | 1School of Computer Science, Leshan Normal University, Leshan 614000, China cxyforpaper@gmail.com, jandp@pku.edu.cn 2Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China yqxia@tsinghua.edu.cn 3Department of Informatics, University of Sussex, Brighton BN1 9QJ, UK j.a.carroll@sussex.ac.uk |
| Pseudocode | Yes | Algorithm 1 presents this more formally. |
| Open Source Code | No | The paper mentions external tools used for comparison ('LIBSVM' and 'the implementation of Wang, Blei, and Li (2009)'), providing links to their repositories, but does not provide access to the authors' own source code for the Desc LDA method described in the paper. |
| Open Datasets | Yes | We use two datasets: 20Newsgroups (20NG): Introduced by Lang (1995)... RCV1: An archive of multi-labeled newswire stories (Lewis et al. 2004). |
| Dataset Splits | Yes | 20Newsgroups (20NG): ... The dataset is divided into training (60%) and test (40%) sets. RCV1: ... 13,625 stories are used as the training set and 6,188 stories as the test set. In our experiments we use the standard training/test partitions of the two datasets. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud computing specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'linear SVM using the package LIBSVM' and 'adopt the implementation of Wang, Blei, and Li (2009)' for sLDA, but it does not specify version numbers for these software components or any other libraries/dependencies. |
| Experiment Setup | Yes | For our Desc LDA method, we set α = 0.1 and η = 0.2. We vary K (the number of topics) across the range used in previous work (Blei and Mc Auliffe 2007). For the number of iterations, in preliminary experiments we observed good accuracy at 30. |