Retweet Behavior Prediction Using Hierarchical Dirichlet Process

Authors: Qi Zhang, Yeyun Gong, Ya Guo, Xuanjing Huang

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the proposed method, we collect a large number of microblogs and their corresponding social networks from a real microblog service. Experimental results on the constructed dataset demonstrate that the proposed method can achieve better performance than state-of-the-art methods.
Researcher Affiliation Academia Qi Zhang, Yeyun Gong, Ya Guo, Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing School of Computer Science, Fudan University 825 Zhangheng Road, Shanghai, P.R.China {qz, 12110240006, 13210240002, xjhuang}@fudan.edu.cn
Pseudocode Yes The Generation Process For each followee of the user u. a = 1, 2, ..., Au 1. Draw ψa Beta(λ) 2. For each microblog of followee a. d = 1, 2, ..., Da. a. Draw a retweet label ld Binomial(ψa) b. Draw a normalized retweet times xd Beta(ηld) c. For each word n = 1, ..., Nd i. Choose an exist cluster tdn = t ndt ii. Choose a new cluster tdn = tnew α iii. If choose an existing cluster then Draw a word wdn = w according to probability p(wdn = w|tdn, ld, kdtdn) iv. If choose a new cluster then i) Choose an existing topic kdtnew dn = k m.k ii)Choose a new topic kdtnew dn = knew γ iii) Draw a word wdn = w according to probability p(wdn = w|tdn, ld, kdtnew dn )
Open Source Code No The paper does not provide a concrete link or explicit statement about the availability of its source code.
Open Datasets No The paper states that they 'collected the data set from Sina Weibo' and provides statistics for this collected dataset. It mentions that '70% of all microblogs in their browsing history [were used] as training data'. However, it does not provide any concrete access information (link, DOI, formal citation with author/year) for this dataset, implying it is not publicly available.
Dataset Splits Yes For each user, we randomly selected about 70% of all microblogs in their browsing history as training data and 10% as development data. The other 20% are used as the test data.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory, or specific computing infrastructure).
Software Dependencies No The paper mentions 'LDA-based model' and 'HDP-based model' and compares with methods like 'Naive Bayes' and 'SVMRank' (which might imply certain libraries), but it does not specify any software names with version numbers (e.g., Python version, library versions like TensorFlow, PyTorch, scikit-learn).
Experiment Setup Yes We ran our model with 500 iterations of Gibbs sampling. In the HDP-based model, we used γ Gamma(1, 1) and α Gamma(1, 1) as prior distributions for the concentration parameters. The base measure H for both retweet labels used is a symmetric Dirichlet distribution with parameters of 0.5. In the LDA-based model, we use α = 50.0/K and δ = 0.1 for both retweet labels, after trying a few different numbers of topics, we empirically set the number of topics to 20. In both of the two model we set parameter λ1 = λ2 = 0.1.