Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Improving the Effectiveness and Efficiency of Stochastic Neighbour Embedding with Isolation Kernel

Authors: Ye Zhu, Kai Ming Ting

JAIR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section presents the result of utility evaluation of isolation kernel and Gaussian kernel in t-SNE using 21 real-world datasets with different data sizes and dimensions. We report the best performance of each algorithm with a systematic parameter search with the range shown in Table 4. Table 5 shows the results of the two kernels used in t-SNE. The Isolation kernel performs better on 18 out of 21 datasets in terms of AUCRNX...
Researcher Affiliation	Academia	Ye Zhu EMAIL School of Information Technology, Deakin University, Burwood, Victoria, Australia 3125 Kai Ming Ting EMAIL National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu, China 210023
Pseudocode	Yes	The procedure of t-SNE is provided in Algorithm 1. ... Algorithm 2 t-SNE(D, ψ, m) which employs the Isolation kernel
Open Source Code	Yes	A demonstration of using t-SNE with Isolation kernel can be obtained from https://github.com/zhuye88/IKt-sne.
Open Datasets	Yes	COIL20, Human Activity and Isolet are from (Li, Cheng, Wang, Morstatter, Robert, Tang, & Liu, 2016); News20 and Rcv1 are from (Chang & Lin, 2011); and all other real-world datasets are from UCI Machine Learning Repository (Dua & Graﬀ, 2017).
Dataset Splits	No	The paper mentions using either the MNIST dataset or a subsample of 10,000 data points from MNIST8M for processing, but does not specify explicit training/test/validation splits with percentages, sample counts, or citations to predefined splits for these or other datasets.
Hardware Specification	Yes	All algorithms used in the following experiments were implemented in Matlab 2019b and were run on a machine with 14 cores (Intel Xeon E5-2690 v4 @ 2.59 GHz) and 256GB memory.
Software Dependencies	Yes	All algorithms used in the following experiments were implemented in Matlab 2019b and were run on a machine with 14 cores (Intel Xeon E5-2690 v4 @ 2.59 GHz) and 256GB memory.
Experiment Setup	Yes	We report the best performance of each algorithm with a systematic parameter search with the range shown in Table 4. Note that there is only one manual parameter ψ to control the partitioning mechanism, and the other parameter t can be fixed to a default number. Parameters with search range Gaussian kernel perplexity {1, 5, ..., 97, 0.01n, 0.05n, ..., 0.97n}; tolerance = 0.00005 Isolation kernel ψ {1, 5, ..., 97, 0.01n, 0.05n, ..., 0.97n}; t = 200