Diversity Maximization in the Presence of Outliers
Authors: Daichi Amagata
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on real datasets to demonstrate the effectiveness and efficiency of our algorithms. ... 5 Experiment All experiments were conducted on a Ubuntu 20.04 LTS machine equipped with Xeon Platinum 8268 CPU@2.90GHz and 768GB RAM. Dataset. We used the following real datasets2. |
| Researcher Affiliation | Academia | Daichi Amagata Osaka University amagata.daichi@ist.osaka-u.ac.jp |
| Pseudocode | Yes | Algorithm 1: GMM(X, k) ... Algorithm 2: BASELINE(X, k, z) ... Algorithm 3: GREEDY(X, k, z) ... Algorithm 4: STREAMING(X , k) ... Algorithm 5: CORESET(X, k, z) |
| Open Source Code | Yes | Source codes of our algorithms are available https://github.com/amgt-d1/Max-Min-w-Outliers. |
| Open Datasets | Yes | Dataset. We used the following real datasets2. FCT: a set of 10-dimensional cartographic variables for forest cover type, and n = 580, 812. Household: a set of 7-dimensional sensor readings, and n = 2, 049, 280. KDD99: a set of 16-dimensional packet records, and n = 311, 029. Mirai: a set of 115-dimensional Mirai malware infected network capture data, and n = 764, 137. 2https://archive.ics.uci.edu/ml/datasets.php |
| Dataset Splits | No | The paper does not explicitly specify training, validation, and test splits for the datasets, nor does it refer to predefined splits with citations. |
| Hardware Specification | Yes | All experiments were conducted on a Ubuntu 20.04 LTS machine equipped with Xeon Platinum 8268 CPU@2.90GHz and 768GB RAM. |
| Software Dependencies | Yes | All algorithms were implemented in C++, compiled by g++ 9.4.0 with -O3 flag, and single threaded. |
| Experiment Setup | Yes | We set ϵ = 0.01. For CORESET, we set the coreset size so that the success probability was 0.95. We set k = 100 and z = 200 by default. This setting of z is similar to those in the evaluation paper (Campos et al. 2016) and in the experiments using large datasets (Ceccarello, Pietracaprina, and Pucci 2019; Gupta et al. 2017). When studying the impact of k (resp. z), the value of z (resp. k) was fixed. We ran each algorithm 20 times and report the average result. |