MicroScholar: Mining Scholarly Information from Chinese Microblogs
Authors: Yang Yu, Xiaojun Wan
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | experimental results demonstrate their usefulness.In order to evaluate the classification performance, we crawl several thousand microblog texts and manually annotate them into four types described above, then construct a balanced evaluation dataset of 2,142 microblog texts (592: 491: 514: 545 for the four categories) by sampling from the whole annotation corpus. We perform all SVM experiments in 10-fold cross validation. The evaluation results are shown in Table 1. |
| Researcher Affiliation | Academia | Institute of Computer Science and Technology, Peking University, Beijing 100871, China The MOE Key Laboratory of Computational Linguistics, Peking University, Beijing 100871, China {yu.yang, wanxiaojun}@pku.edu.cn |
| Pseudocode | No | The paper does not contain any structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper references third-party tools like WEKA and Gibbs LDA++ with URLs, but it does not provide any concrete access to the source code for the 'Micro Scholar' system or the specific methodology described in the paper. |
| Open Datasets | No | The paper describes the creation of its own evaluation dataset (2,142 microblog texts) and an unlabeled corpus (113,925 microblogs) by crawling and manual annotation. However, it does not provide a direct link, DOI, repository name, or formal citation for accessing these specific datasets created by the authors for their experiments. |
| Dataset Splits | Yes | We perform all SVM experiments in 10-fold cross validation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions 'WEKA toolbox' and 'Gibbs LDA++' but does not provide specific version numbers for these or any other software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | We utilize the popular SVM classifier for the categorization task and apply the SMO algorithm in WEKA toolbox for implementation. We apply Gibbs LDA++ for the LDA implementation, and the number of topics is set to 300. Figure 2 plots the performance values of SVM(T+D+LDA) with respect to different number of topics ranging from 100 to 500. |