Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Nearly optimal classification for semimetrics

Authors: Lee-Ad Gottlieb, Aryeh Kontorovich, Pinhas Nisnevitch

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conclude this section with an illustration of how the theory developed in this paper explains the success of the greedy net-based compression algorithm, even in the case of semimetrics. We present results for the Hausdorﬀsemimetric applied to the Covertype dataset, found in the UCI Machine Learning Repository.2 This dataset contains 7 diﬀerent label types, which we treated as 21 separate binary classiﬁcation problems; we report representative results below. data set original size % compressed down to Covertype 2 vs. 5 2000 97 Covertype 1 vs. 4 2000 25 Covertype 4 vs. 7 2000 2 Figure 2: Summary of the performance of semimetric sample compression algorithm.
Researcher Affiliation	Academia	Lee-Ad Gottlieb EMAIL Department of Computer Science Ariel University Ariel, Israel Aryeh Kontorovich EMAIL Department of Computer Science Ben-Gurion University Beer Sheva, Israel Pinhas Nisnevitch EMAIL Department of Computer Science Tel-Aviv University Tel-Aviv, Israel
Pseudocode	Yes	Algorithm 1 Brute-force net construction Require: sample S, margin r Ensure: C is an r-net for S if ρ(x, C) r then C = C {x} end if end for
Open Source Code	No	The paper does not explicitly state that source code for the described methodology is open-source or provide a link to a repository. It only mentions the JMLR license for the paper itself.
Open Datasets	Yes	We present results for the Hausdorﬀsemimetric applied to the Covertype dataset, found in the UCI Machine Learning Repository.2 This dataset contains 7 diﬀerent label types... 2. http://tinyurl.com/cover-data
Dataset Splits	No	The paper mentions applying the semimetric to the Covertype dataset and reports compressed sizes for specific binary classification problems (e.g., Covertype 2 vs. 5). However, it does not provide details on training, validation, or test splits for this dataset.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models, memory, or cluster specifications.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup	No	The paper describes the theoretical framework and algorithmic aspects of classification for semimetrics and presents results on a dataset. However, it does not provide specific experimental setup details such as hyperparameters (e.g., learning rate, batch size, number of epochs) or other training configurations for the algorithms applied.