Diversity Maximization in the Presence of Outliers

Authors: Daichi Amagata

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on real datasets to demonstrate the effectiveness and efficiency of our algorithms. ... 5 Experiment All experiments were conducted on a Ubuntu 20.04 LTS machine equipped with Xeon Platinum 8268 CPU@2.90GHz and 768GB RAM. Dataset. We used the following real datasets2.
Researcher Affiliation Academia Daichi Amagata Osaka University amagata.daichi@ist.osaka-u.ac.jp
Pseudocode Yes Algorithm 1: GMM(X, k) ... Algorithm 2: BASELINE(X, k, z) ... Algorithm 3: GREEDY(X, k, z) ... Algorithm 4: STREAMING(X , k) ... Algorithm 5: CORESET(X, k, z)
Open Source Code Yes Source codes of our algorithms are available https://github.com/amgt-d1/Max-Min-w-Outliers.
Open Datasets Yes Dataset. We used the following real datasets2. FCT: a set of 10-dimensional cartographic variables for forest cover type, and n = 580, 812. Household: a set of 7-dimensional sensor readings, and n = 2, 049, 280. KDD99: a set of 16-dimensional packet records, and n = 311, 029. Mirai: a set of 115-dimensional Mirai malware infected network capture data, and n = 764, 137. 2https://archive.ics.uci.edu/ml/datasets.php
Dataset Splits No The paper does not explicitly specify training, validation, and test splits for the datasets, nor does it refer to predefined splits with citations.
Hardware Specification Yes All experiments were conducted on a Ubuntu 20.04 LTS machine equipped with Xeon Platinum 8268 CPU@2.90GHz and 768GB RAM.
Software Dependencies Yes All algorithms were implemented in C++, compiled by g++ 9.4.0 with -O3 flag, and single threaded.
Experiment Setup Yes We set ϵ = 0.01. For CORESET, we set the coreset size so that the success probability was 0.95. We set k = 100 and z = 200 by default. This setting of z is similar to those in the evaluation paper (Campos et al. 2016) and in the experiments using large datasets (Ceccarello, Pietracaprina, and Pucci 2019; Gupta et al. 2017). When studying the impact of k (resp. z), the value of z (resp. k) was fixed. We ran each algorithm 20 times and report the average result.