reproducibilityindex.ai

Dataless Text Classification with Descriptive LDA

Authors: Xingyuan Chen, Yunqing Xia, Peng Jin, John Carroll

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results with the 20Newsgroups and RCV1 datasets show that: (1) our DLTC method is more effective than the semantic-based DLTC baseline method; and (2) the accuracy of our DLTC method is very close to state-of-the-art supervised text classiﬁcation methods.
Researcher Affiliation	Academia	1School of Computer Science, Leshan Normal University, Leshan 614000, China cxyforpaper@gmail.com, jandp@pku.edu.cn 2Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China yqxia@tsinghua.edu.cn 3Department of Informatics, University of Sussex, Brighton BN1 9QJ, UK j.a.carroll@sussex.ac.uk
Pseudocode	Yes	Algorithm 1 presents this more formally.
Open Source Code	No	The paper mentions external tools used for comparison ('LIBSVM' and 'the implementation of Wang, Blei, and Li (2009)'), providing links to their repositories, but does not provide access to the authors' own source code for the Desc LDA method described in the paper.
Open Datasets	Yes	We use two datasets: 20Newsgroups (20NG): Introduced by Lang (1995)... RCV1: An archive of multi-labeled newswire stories (Lewis et al. 2004).
Dataset Splits	Yes	20Newsgroups (20NG): ... The dataset is divided into training (60%) and test (40%) sets. RCV1: ... 13,625 stories are used as the training set and 6,188 stories as the test set. In our experiments we use the standard training/test partitions of the two datasets.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud computing specifications used for running the experiments.
Software Dependencies	No	The paper mentions using 'linear SVM using the package LIBSVM' and 'adopt the implementation of Wang, Blei, and Li (2009)' for sLDA, but it does not specify version numbers for these software components or any other libraries/dependencies.
Experiment Setup	Yes	For our Desc LDA method, we set α = 0.1 and η = 0.2. We vary K (the number of topics) across the range used in previous work (Blei and Mc Auliffe 2007). For the number of iterations, in preliminary experiments we observed good accuracy at 30.