Online Bayesian Models for Personal Analytics in Social Media

Authors: Svitlana Volkova, Benjamin Van Durme

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We design a set of classification experiments from three types of data streams including user (U), neighbor (N) and user-neighbor (UN). For all settings we perform 6-fold cross validation and use a balanced prior: 50 users in the train split and 250 users in the test.
Researcher Affiliation Academia Svitlana Volkova and Benjamin Van Durme Center for Language and Speech Processing, Johns Hopkins University, Baltimore MD 21218, USA svitlana@jhu.edu, vandurme@cs.jhu.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper refers to using the 'LIBLINEAR package integrated in the JERBOA toolkit' which are third-party tools, but does not provide concrete access to the authors' own source code for the methodology described.
Open Datasets Yes We rely on a dataset previously used for political affiliation classification by (Pennacchiotti and Popescu 2011), then (Zamal, Liu, and Ruths 2012) and (Volkova, Coppersmith, and Van Dume 2014).1 The original Twitter users with their political labels extracted from http://www.wefollow.com as described by (Pennacchiotti and Popescu 2011).
Dataset Splits Yes For all settings we perform 6-fold cross validation and use a balanced prior: 50 users in the train split and 250 users in the test.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, cloud instances) used for running the experiments.
Software Dependencies No The paper mentions using the 'LIBLINEAR package integrated in the JERBOA toolkit' but does not specify version numbers for these software components.
Experiment Setup Yes We design a set of classification experiments from three types of data streams including user (U), neighbor (N) and user-neighbor (UN). For all settings we perform 6-fold cross validation and use a balanced prior: 50 users in the train split and 250 users in the test. The log-linear models with dynamic Bayesian updates defined in Eq.1 and Eq.3 are learned using binary word unigram features extracted from user or neighbor content.