Sampling Representative Users from Large Social Networks

Authors: Jie Tang, Chenhui Zhang, Keke Cai, Li Zhang, Zhong Su

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on two datasets show that the proposed models for sampling representative users significantly outperform (+6%-23% in terms of Precision@100) several alternative methods using authority or structure information only. The proposed algorithms are also effective in terms of time complexity. Only a few seconds are needed to sampling 300 representative users from a network of 100,000 users. All data and codes are publicly available.1
Researcher Affiliation Collaboration Jie Tang , Chenhui Zhang , Keke Cai , Li Zhang , Zhong Su Department of Computer Science and Technology, Tsinghua University Tsinghua National Laboratory for Information Science and Technology (TNList) IBM, China Research Lab jietang@tsinghua.edu.cn, zh.sherlock@gmail.com, {caikeke, lizhang, suzhong}@cn.ibm.com
Pseudocode Yes Algorithm 1: Approximate algorithm for S3 model.
Open Source Code Yes All data and codes are publicly available.1 1http://arnetminer.org/repuser/
Open Datasets Yes All data and codes are publicly available.1 1http://arnetminer.org/repuser/
Dataset Splits No No specific dataset split information (percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for validation was found. The paper describes train and test type evaluation.
Hardware Specification No No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments were found.
Software Dependencies No The paper states 'We implement all the algorithms C++.' but does not provide specific version numbers for C++ or any libraries.
Experiment Setup Yes Let fl be the total frequency of the l-th keyword (1 l 200). We assign ml = f 0.5 l in the S3 model and λl = f 1 l in the SSD model. The parameter β in the S3 model is set to be β = 0.7, by tuning from 0.1 to 1 with interval 0.1.