Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Interactive Rare-Category-of-Interest Mining from Large Datasets

Authors: Zhenguang Liu, Sihao Hu, Yifang Yin, Jianhai Chen, Kevin Chiew, Luming Zhang, Zetian Wu4965-4972

AAAI 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on ﬁve diverse realworld datasets show that our method achieves the response time in seconds for user interactions, and outperforms stateof-the-art competitors signiﬁcantly in accuracy and number of queries. As a side contribution, we construct and release two benchmark datasets which to our knowledge are the ﬁrst public datasets tailored for rare category mining task.
Researcher Affiliation	Collaboration	Zhenguang Liu,1 Sihao Hu,2,4 Yifang Yin,3 Jianhai Chen,2 Kevin Chiew, Luming Zhang,2 Zetian Wu2 1Zhejiang Gongshang University 2Zhejiang University 3National University of Singapore 4Alibaba Group
Pseudocode	No	The paper provides mathematical formulations and descriptions of its models but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code	No	The implementation codes and datasets will be released upon acceptance. The datasets will be released at https://github.com/Bayi-Hu/Interactive-Rare-Category-of-Interest-Mining. The statement regarding code explicitly says it 'will be released upon acceptance', which indicates future availability, not current access. While a GitHub link is provided, it is explicitly for 'datasets'.
Open Datasets	Yes	Since there is a lack of benchmark datasets that are specially tailored for rare category mining task, we construct two datasets, Game and Bird, which come from two practical problems and contain images and audio data, respectively. [...] The datasets will be released at https://github.com/Bayi-Hu/Interactive-Rare-Category-of-Interest-Mining. Besides Game and Bird datasets, three public datasets are also engaged in the experiments, namely Kddcup (on network intrusion), Abalone (on physical measurements of abalones), and Shuttle (on space shuttle), which are widely used in existing works (He and Carbonell 2007; Vatturi and Wong 2009; Zhou et al. 2018; Huang et al. 2013).
Dataset Splits	No	The paper does not provide specific training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit cross-validation schemes). It refers to 'unlabeled data examples' and interactive processes but not explicit data partitioning for evaluation.
Hardware Specification	Yes	All experiments are conducted on a server equipped with 40 Intel Xeon E5-2640V4 v CPUs and 96 GB RAM.
Software Dependencies	No	The paper mentions using ResNet-50 and a CNN network for feature extraction but does not provide specific version numbers for any software dependencies, programming languages, or libraries used in the implementation.
Experiment Setup	Yes	Parameter Settings. For RCD, the lower bound kmin of the k values is constantly set to 2 across different datasets, while the upper bound kmax is set to 200, 500, 200, 1,000, and 1,000 respectivley for Abalone, Bird, Shuttle, Kddcup, and Game datasets.