Location-Sensitive User Profiling Using Crowdsourced Labels

Authors: Wei Niu, James Caverlee, Haokai Lu

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments over a Twitter list dataset, we demonstrate the effectiveness of this location-sensitive user profiling.
Researcher Affiliation Academia Wei Niu, James Caverlee, Haokai Lu Department of Computer Science and Engineering, Texas A&M University College Station, TX 77840, USA {wei,caverlee,hlu}@cse.tamu.edu
Pseudocode Yes Algorithm 1: Mincost Tree Formation and Algorithm 2: Approximation Algorithm are presented in the paper.
Open Source Code No The paper does not contain any explicit statement about releasing the source code for their methodology nor provides a link to a code repository.
Open Datasets No The paper states: 'We rely on a Twitter list dataset containing 15 million list relationships in which the geo-coordinates of the labelers and users are known (?).' The citation is ambiguous, and no specific link, DOI, or clear statement of public availability for the dataset is provided.
Dataset Splits Yes The result reported for every profiling experiment in this paper, including baselines, are based on four-fold cross validation and averaged over the nine locations. For each user, the seen tag set Pk(u) is a random 25% of his profile P(u). Then we try to predict tags in the rest 75% unseen tags.
Hardware Specification No The paper does not mention any specific hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'Rank SVM(?)' and a 'language identification package (?)' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For reproducibility, the number of negative samples, number of iterations, number of user and tag latent factors are set as 200, 80, 20 respectively. Regularization weights are set as 0.02. We apply text processing techniques such as case folding, stopword removal, and noun singularization. We also separate the string pattern like Food Drink into two words food and drink . We use language identification package (?) to filter out non-English tags. To guarantee the informativeness and quality of the tags, we filter out infrequent tags with fewer than 5 labelers and 10 labelees. A total of 13 features are used for training the model include features introduced above.