Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sampling Representative Users from Large Social Networks
Authors: Jie Tang, Chenhui Zhang, Keke Cai, Li Zhang, Zhong Su
AAAI 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on two datasets show that the proposed models for sampling representative users significantly outperform (+6%-23% in terms of Precision@100) several alternative methods using authority or structure information only. The proposed algorithms are also effective in terms of time complexity. Only a few seconds are needed to sampling 300 representative users from a network of 100,000 users. All data and codes are publicly available.1 |
| Researcher Affiliation | Collaboration | Jie Tang , Chenhui Zhang , Keke Cai , Li Zhang , Zhong Su Department of Computer Science and Technology, Tsinghua University Tsinghua National Laboratory for Information Science and Technology (TNList) IBM, China Research Lab EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Approximate algorithm for S3 model. |
| Open Source Code | Yes | All data and codes are publicly available.1 1http://arnetminer.org/repuser/ |
| Open Datasets | Yes | All data and codes are publicly available.1 1http://arnetminer.org/repuser/ |
| Dataset Splits | No | No specific dataset split information (percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for validation was found. The paper describes train and test type evaluation. |
| Hardware Specification | No | No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments were found. |
| Software Dependencies | No | The paper states 'We implement all the algorithms C++.' but does not provide specific version numbers for C++ or any libraries. |
| Experiment Setup | Yes | Let fl be the total frequency of the l-th keyword (1 l 200). We assign ml = f 0.5 l in the S3 model and λl = f 1 l in the SSD model. The parameter β in the S3 model is set to be β = 0.7, by tuning from 0.1 to 1 with interval 0.1. |