Interactive Rare-Category-of-Interest Mining from Large Datasets

Authors: Zhenguang Liu, Sihao Hu, Yifang Yin, Jianhai Chen, Kevin Chiew, Luming Zhang, Zetian Wu4965-4972

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on five diverse realworld datasets show that our method achieves the response time in seconds for user interactions, and outperforms stateof-the-art competitors significantly in accuracy and number of queries. As a side contribution, we construct and release two benchmark datasets which to our knowledge are the first public datasets tailored for rare category mining task.
Researcher Affiliation Collaboration Zhenguang Liu,*1 Sihao Hu,*2,4 Yifang Yin,3 Jianhai Chen,2 Kevin Chiew, Luming Zhang,2 Zetian Wu2 1Zhejiang Gongshang University 2Zhejiang University 3National University of Singapore 4Alibaba Group
Pseudocode No The paper provides mathematical formulations and descriptions of its models but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code No The implementation codes and datasets will be released upon acceptance. The datasets will be released at https://github.com/Bayi-Hu/Interactive-Rare-Category-of-Interest-Mining. The statement regarding code explicitly says it 'will be released upon acceptance', which indicates future availability, not current access. While a GitHub link is provided, it is explicitly for 'datasets'.
Open Datasets Yes Since there is a lack of benchmark datasets that are specially tailored for rare category mining task, we construct two datasets, Game and Bird, which come from two practical problems and contain images and audio data, respectively. [...] The datasets will be released at https://github.com/Bayi-Hu/Interactive-Rare-Category-of-Interest-Mining. Besides Game and Bird datasets, three public datasets are also engaged in the experiments, namely Kddcup (on network intrusion), Abalone (on physical measurements of abalones), and Shuttle (on space shuttle), which are widely used in existing works (He and Carbonell 2007; Vatturi and Wong 2009; Zhou et al. 2018; Huang et al. 2013).
Dataset Splits No The paper does not provide specific training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit cross-validation schemes). It refers to 'unlabeled data examples' and interactive processes but not explicit data partitioning for evaluation.
Hardware Specification Yes All experiments are conducted on a server equipped with 40 Intel Xeon E5-2640V4 v CPUs and 96 GB RAM.
Software Dependencies No The paper mentions using ResNet-50 and a CNN network for feature extraction but does not provide specific version numbers for any software dependencies, programming languages, or libraries used in the implementation.
Experiment Setup Yes Parameter Settings. For RCD, the lower bound kmin of the k values is constantly set to 2 across different datasets, while the upper bound kmax is set to 200, 500, 200, 1,000, and 1,000 respectivley for Abalone, Bird, Shuttle, Kddcup, and Game datasets.