reproducibilityindex.ai

Finding Statistically Significant Interactions between Continuous Features

Authors: Mahito Sugiyama, Karsten Borgwardt

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We examine the effectiveness and the efﬁciency of C-Tarone using synthetic and real-world datasets.
Researcher Affiliation	Academia	1National Institute of Informatics, Tokyo 101-8430, Japan 2JST PRESTO, Japan 3D-BSSE, ETH Z urich, Basel 4058, Switzerland 4SIB Swiss Institute of Bioinformatics, Switzerland
Pseudocode	Yes	Algorithm 1: C-Tarone.
Open Source Code	No	The paper states "All methods were implemented in C/C++ and compiled with gcc 4.8.5", but it does not provide any link to source code, nor does it explicitly state that the code will be made open source or available in supplementary materials.
Open Datasets	Yes	We also evaluate C-Tarone on real-world datasets shown in Table 2 in Appendix, which are benchmark datasets for binary classiﬁcation from the UCI repository [Lichman, 2013].
Dataset Splits	No	The paper does not provide specific details on how the datasets were split into training, validation, or test sets (e.g., percentages, sample counts, or cross-validation schemes) for reproducibility.
Hardware Specification	Yes	We used Amazon Linux AMI release 2017.09 and ran all experiments on a single core of 2.3 GHz Intel Xeon CPU E7-8880 v3 and 2.0 TB of memory.
Software Dependencies	Yes	All methods were implemented in C/C++ and compiled with gcc 4.8.5.
Experiment Setup	Yes	The FWER level α = 0.05 throughout experiments. In each dataset, we generate 20% of features that are associated with the class labels. More precisely, first we generate the entire dataset from the uniform distribution from 0 to 1 and assign the class label 1 to the first N1 data point. Then, for the N1 data points in the class 1, we pick up one of the 20% of associated features and copy it to every associated feature with adding Gaussian noise with (µ, σ2) = (0, 0.1). We used the rpart function in R with its default parameter setting, where the Gini index is used for splitting and the minimum number of data points that must exist in a node is 20.