Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Online Social Spammer Detection
Authors: Xia Hu, Jiliang Tang, Huan Liu
AAAI 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on Twitter datasets confirm the effectiveness and efficiency of the proposed framework. |
| Researcher Affiliation | Academia | Xia Hu, Jiliang Tang, Huan Liu Computer Science and Engineering, Arizona State University, USA EMAIL |
| Pseudocode | Yes | Algorithm 1: Online Social Spammer Detection |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the methodology described. |
| Open Datasets | Yes | TAMU Social Honeypots Dataset (Twitter T):1 This dataset was originally collected from December 30, 2009 to August 2, 2010 on Twitter and introduced in (Lee et al. 2011). Twitter Suspended Spammers Dataset (Twitter S): Following the data crawling process used in (Yang et al. 2011; Zhu et al. 2012), we crawled this Twitter dataset from July to September 2012 via the Twitter Search API. |
| Dataset Splits | Yes | In the experiments, five-fold cross-validation is used for all the methods. To study the effects brought by different sizes of training data, we varies the training data from 10% to 100%. In particular, for each round of the experiment, 20% of the dataset is held for testing and 10% to 100% of the original training data is sampled for training. |
| Hardware Specification | Yes | The experiments are run on a single-CPU, eight-core 3.40Ghz machine. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | One positive parameters is involved in the experiments. is to control the contribution of social network information. As a common practice, all the parameters can be tuned via cross-validation with validation data. In the experiments, we empirically set = 0.1 for experiments. |