Manifold Learning for Jointly Modeling Topic and Visualization

Authors: Tuan Le, Hady Lauw

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on several real-life text datasets of news articles and web pages show that SEMAFORE significantly outperforms the state-of-the-art baselines on objective evaluation metrics.
Researcher Affiliation Academia Tuan M. V. Le and Hady W. Lauw School of Information Systems, Singapore Management University, 80 Stamford Road, Singapore 178902 {vmtle.2012@phdis.smu.edu.sg, hadywlauw@smu.edu.sg}
Pseudocode No The paper describes the generative process and model fitting steps using textual descriptions and mathematical equations, but it does not include a structured pseudocode block or algorithm.
Open Source Code No The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We use three real-life, publicly available datasets1 for evaluation. 20News contains newsgroup articles (in English) from 20 classes. Reuters8 contains newswire articles (in English) from 8 classes. Cade12 contains web pages (in Brazilian Portuguese) classified into 12 classes. These are benchmark datasets frequently used for document classification. 1http://web.ist.utl.pt/acardoso/datasets/
Dataset Splits No The paper mentions generating five samples for each dataset and using a sixth sample as a test set, but it does not provide specific details on training/validation splits within these samples (e.g., percentages or counts for a dedicated validation set).
Hardware Specification No The paper does not provide any specific details regarding the hardware used to conduct the experiments.
Software Dependencies No The paper mentions using a 'quasi-Newton' method for optimization and setting hyperparameters, but it does not list any specific software libraries, frameworks, or their version numbers.
Experiment Setup Yes We set the hyper-parameters to α = 0.01, β = 0.1N and γ = 0.1Z following (Iwata, Yamada, and Ueda 2008). When unvaried, the defaults are number of topics Z = 20, neighborhood size k = 10, and regularization R with λ = 1.