Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improving the Effectiveness and Efficiency of Stochastic Neighbour Embedding with Isolation Kernel
Authors: Ye Zhu, Kai Ming Ting
JAIR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section presents the result of utility evaluation of isolation kernel and Gaussian kernel in t-SNE using 21 real-world datasets with different data sizes and dimensions. We report the best performance of each algorithm with a systematic parameter search with the range shown in Table 4. Table 5 shows the results of the two kernels used in t-SNE. The Isolation kernel performs better on 18 out of 21 datasets in terms of AUCRNX... |
| Researcher Affiliation | Academia | Ye Zhu EMAIL School of Information Technology, Deakin University, Burwood, Victoria, Australia 3125 Kai Ming Ting EMAIL National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu, China 210023 |
| Pseudocode | Yes | The procedure of t-SNE is provided in Algorithm 1. ... Algorithm 2 t-SNE(D, Ο, m) which employs the Isolation kernel |
| Open Source Code | Yes | A demonstration of using t-SNE with Isolation kernel can be obtained from https://github.com/zhuye88/IKt-sne. |
| Open Datasets | Yes | COIL20, Human Activity and Isolet are from (Li, Cheng, Wang, Morstatter, Robert, Tang, & Liu, 2016); News20 and Rcv1 are from (Chang & Lin, 2011); and all other real-world datasets are from UCI Machine Learning Repository (Dua & Graο¬, 2017). |
| Dataset Splits | No | The paper mentions using either the MNIST dataset or a subsample of 10,000 data points from MNIST8M for processing, but does not specify explicit training/test/validation splits with percentages, sample counts, or citations to predefined splits for these or other datasets. |
| Hardware Specification | Yes | All algorithms used in the following experiments were implemented in Matlab 2019b and were run on a machine with 14 cores (Intel Xeon E5-2690 v4 @ 2.59 GHz) and 256GB memory. |
| Software Dependencies | Yes | All algorithms used in the following experiments were implemented in Matlab 2019b and were run on a machine with 14 cores (Intel Xeon E5-2690 v4 @ 2.59 GHz) and 256GB memory. |
| Experiment Setup | Yes | We report the best performance of each algorithm with a systematic parameter search with the range shown in Table 4. Note that there is only one manual parameter Ο to control the partitioning mechanism, and the other parameter t can be fixed to a default number. Parameters with search range Gaussian kernel perplexity {1, 5, ..., 97, 0.01n, 0.05n, ..., 0.97n}; tolerance = 0.00005 Isolation kernel Ο {1, 5, ..., 97, 0.01n, 0.05n, ..., 0.97n}; t = 200 |