Mining Query Subtopics from Questions in Community Question Answering
Authors: Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on large scale real world CQA datasets show that the proposed method significantly outperforms the existing methods in terms of keyword extraction, while achieving a comparable performance to the state-of-the-art methods for question clustering. |
| Researcher Affiliation | Collaboration | Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou State Key Lab of Software Development Environment, Beihang University, Beijing, China Microsoft Research, Beijing, China {wuyu,lizj}@buaa.edu.cn {wuwei,mingzhou}@microsoft.com |
| Pseudocode | Yes | Algorithm 1: Optimization Algorithm for Problem (1) and Algorithm 2: A heuristic method for selecting k |
| Open Source Code | No | The paper does not include an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | While the paper mentions crawling data from Quora (https://www.quora.com/) and Zhihu (http://www.zhihu.com/), it does not provide a specific link, DOI, or repository for the processed 'evaluation data' they created from these platforms. |
| Dataset Splits | Yes | Therefore, we randomly split the evaluation data into a validation set and a test set with a ratio of 1 : 3 |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or specific cloud computing instances used for running the experiments. |
| Software Dependencies | No | The paper mentions general techniques and tools like 'Chinese word segmentation' but does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions) used for their implementation. |
| Experiment Setup | Yes | For our method, we followed existing methods (Cai et al. 2011; Kuang, Park, and Ding 2012) and implemented Algorithm 1 with β = 0.1, σ = 0.01, ϵ = 10^-4, and T = 200. α in Equation (1) and the number of nearest neighbors p for D needed tuning. ... The best choice for α was 100 with both the Quora and Zhihu data, while with the Quora data the best p was 7, and with the Zhihu data the best p was 4. ... we estimated the threshold t using the validation set (it is 1.15) |