Lifetime Lexical Variation in Social Media

Authors: Lizi Liao, Jing Jiang, Ying Ding, Heyan Huang, Ee-Peng Lim

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation shows that our model can learn meaningful age-specific topics such as school for teenagers and health for older people. Our model can also be used for age prediction and performs better than a number of baseline methods. Experiments This section presents the empirical evaluation of our model.
Researcher Affiliation Academia Lizi Liao School of Computer Science Beijing Institute of Technology liaolizi.llz@gmail.com Jing Jiang School of Information System Singapore Management University jingjiang@smu.edu.sg Ying Ding School of Information System Singapore Management University ying.ding.2011@phdis.smu.edu.sg Heyan Huang School of Computer Science Beijing Institute of Technology hhy63@bit.edu.cn Ee-Peng Lim School of Information System Singapore Management University eplim@smu.edu.sg
Pseudocode No The paper describes the Gibbs-EM algorithm using mathematical formulas and textual explanations for the E-step and M-step, but it does not provide a structured pseudocode block or a clearly labeled "Algorithm" figure.
Open Source Code No The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No Our experiments are based on Twitter. We used the following strategy to crawl Twitter users. Starting from a set of 59 popular seed users in Singapore, we first crawled these users direct followers and followees and then crawled their followers/followees followers and followees... Finally, we got 16,017 users tweets and age information. The paper describes collecting its own dataset from Twitter users but does not provide any specific access information (link, citation, or repository) for public availability.
Dataset Splits No In our age prediction experiments, we randomly selected 150 users from the 1564 users as our test data. For training, a set of users with their tweets and age information are used. around 10% of the users are used for testing and the rest are used for training. The paper specifies a test set and a training set but does not explicitly mention a separate validation set or its specific split.
Hardware Specification No The paper does not provide any specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies No The paper refers to various models and algorithms like "standard LDA", "Gibbs-EM algorithm", and "Support Vector Regression (SVR)" implemented with "Liblinear", but it does not provide specific version numbers for any software or libraries used in the experiments.
Experiment Setup Yes In our experiments, we set α to 0.25 and β to 0.2. We empirically choose 200 topics. We run 32 iterations of Gibbs EM, where during each iteration in the E-step we run 400 iterations of Gibbs sampling.