reproducibilityindex.ai

Diversity Maximization in the Presence of Outliers

Authors: Daichi Amagata

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on real datasets to demonstrate the effectiveness and efficiency of our algorithms. ... 5 Experiment All experiments were conducted on a Ubuntu 20.04 LTS machine equipped with Xeon Platinum 8268 CPU@2.90GHz and 768GB RAM. Dataset. We used the following real datasets2.
Researcher Affiliation	Academia	Daichi Amagata Osaka University amagata.daichi@ist.osaka-u.ac.jp
Pseudocode	Yes	Algorithm 1: GMM(X, k) ... Algorithm 2: BASELINE(X, k, z) ... Algorithm 3: GREEDY(X, k, z) ... Algorithm 4: STREAMING(X , k) ... Algorithm 5: CORESET(X, k, z)
Open Source Code	Yes	Source codes of our algorithms are available https://github.com/amgt-d1/Max-Min-w-Outliers.
Open Datasets	Yes	Dataset. We used the following real datasets2. FCT: a set of 10-dimensional cartographic variables for forest cover type, and n = 580, 812. Household: a set of 7-dimensional sensor readings, and n = 2, 049, 280. KDD99: a set of 16-dimensional packet records, and n = 311, 029. Mirai: a set of 115-dimensional Mirai malware infected network capture data, and n = 764, 137. 2https://archive.ics.uci.edu/ml/datasets.php
Dataset Splits	No	The paper does not explicitly specify training, validation, and test splits for the datasets, nor does it refer to predefined splits with citations.
Hardware Specification	Yes	All experiments were conducted on a Ubuntu 20.04 LTS machine equipped with Xeon Platinum 8268 CPU@2.90GHz and 768GB RAM.
Software Dependencies	Yes	All algorithms were implemented in C++, compiled by g++ 9.4.0 with -O3 flag, and single threaded.
Experiment Setup	Yes	We set ϵ = 0.01. For CORESET, we set the coreset size so that the success probability was 0.95. We set k = 100 and z = 200 by default. This setting of z is similar to those in the evaluation paper (Campos et al. 2016) and in the experiments using large datasets (Ceccarello, Pietracaprina, and Pucci 2019; Gupta et al. 2017). When studying the impact of k (resp. z), the value of z (resp. k) was fixed. We ran each algorithm 20 times and report the average result.