Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Diversity Maximization in the Presence of Outliers

Authors: Daichi Amagata

AAAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on real datasets to demonstrate the effectiveness and efficiency of our algorithms. ... 5 Experiment All experiments were conducted on a Ubuntu 20.04 LTS machine equipped with Xeon Platinum 8268 CPU@2.90GHz and 768GB RAM. Dataset. We used the following real datasets2.
Researcher Affiliation Academia Daichi Amagata Osaka University EMAIL
Pseudocode Yes Algorithm 1: GMM(X, k) ... Algorithm 2: BASELINE(X, k, z) ... Algorithm 3: GREEDY(X, k, z) ... Algorithm 4: STREAMING(X , k) ... Algorithm 5: CORESET(X, k, z)
Open Source Code Yes Source codes of our algorithms are available https://github.com/amgt-d1/Max-Min-w-Outliers.
Open Datasets Yes Dataset. We used the following real datasets2. FCT: a set of 10-dimensional cartographic variables for forest cover type, and n = 580, 812. Household: a set of 7-dimensional sensor readings, and n = 2, 049, 280. KDD99: a set of 16-dimensional packet records, and n = 311, 029. Mirai: a set of 115-dimensional Mirai malware infected network capture data, and n = 764, 137. 2https://archive.ics.uci.edu/ml/datasets.php
Dataset Splits No The paper does not explicitly specify training, validation, and test splits for the datasets, nor does it refer to predefined splits with citations.
Hardware Specification Yes All experiments were conducted on a Ubuntu 20.04 LTS machine equipped with Xeon Platinum 8268 CPU@2.90GHz and 768GB RAM.
Software Dependencies Yes All algorithms were implemented in C++, compiled by g++ 9.4.0 with -O3 flag, and single threaded.
Experiment Setup Yes We set ϵ = 0.01. For CORESET, we set the coreset size so that the success probability was 0.95. We set k = 100 and z = 200 by default. This setting of z is similar to those in the evaluation paper (Campos et al. 2016) and in the experiments using large datasets (Ceccarello, Pietracaprina, and Pucci 2019; Gupta et al. 2017). When studying the impact of k (resp. z), the value of z (resp. k) was fixed. We ran each algorithm 20 times and report the average result.