MicroScholar: Mining Scholarly Information from Chinese Microblogs

Authors: Yang Yu, Xiaojun Wan

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental experimental results demonstrate their usefulness.In order to evaluate the classification performance, we crawl several thousand microblog texts and manually annotate them into four types described above, then construct a balanced evaluation dataset of 2,142 microblog texts (592: 491: 514: 545 for the four categories) by sampling from the whole annotation corpus. We perform all SVM experiments in 10-fold cross validation. The evaluation results are shown in Table 1.
Researcher Affiliation Academia Institute of Computer Science and Technology, Peking University, Beijing 100871, China The MOE Key Laboratory of Computational Linguistics, Peking University, Beijing 100871, China {yu.yang, wanxiaojun}@pku.edu.cn
Pseudocode No The paper does not contain any structured pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper references third-party tools like WEKA and Gibbs LDA++ with URLs, but it does not provide any concrete access to the source code for the 'Micro Scholar' system or the specific methodology described in the paper.
Open Datasets No The paper describes the creation of its own evaluation dataset (2,142 microblog texts) and an unlabeled corpus (113,925 microblogs) by crawling and manual annotation. However, it does not provide a direct link, DOI, repository name, or formal citation for accessing these specific datasets created by the authors for their experiments.
Dataset Splits Yes We perform all SVM experiments in 10-fold cross validation.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions 'WEKA toolbox' and 'Gibbs LDA++' but does not provide specific version numbers for these or any other software dependencies needed to replicate the experiment.
Experiment Setup Yes We utilize the popular SVM classifier for the categorization task and apply the SMO algorithm in WEKA toolbox for implementation. We apply Gibbs LDA++ for the LDA implementation, and the number of topics is set to 300. Figure 2 plots the performance values of SVM(T+D+LDA) with respect to different number of topics ranging from 100 to 500.