Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Nearly optimal classification for semimetrics

Authors: Lee-Ad Gottlieb, Aryeh Kontorovich, Pinhas Nisnevitch

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conclude this section with an illustration of how the theory developed in this paper explains the success of the greedy net-based compression algorithm, even in the case of semimetrics. We present results for the Hausdorffsemimetric applied to the Covertype dataset, found in the UCI Machine Learning Repository.2 This dataset contains 7 different label types, which we treated as 21 separate binary classification problems; we report representative results below. data set original size % compressed down to Covertype 2 vs. 5 2000 97 Covertype 1 vs. 4 2000 25 Covertype 4 vs. 7 2000 2 Figure 2: Summary of the performance of semimetric sample compression algorithm.
Researcher Affiliation Academia Lee-Ad Gottlieb EMAIL Department of Computer Science Ariel University Ariel, Israel Aryeh Kontorovich EMAIL Department of Computer Science Ben-Gurion University Beer Sheva, Israel Pinhas Nisnevitch EMAIL Department of Computer Science Tel-Aviv University Tel-Aviv, Israel
Pseudocode Yes Algorithm 1 Brute-force net construction Require: sample S, margin r Ensure: C is an r-net for S if ρ(x, C) r then C = C {x} end if end for
Open Source Code No The paper does not explicitly state that source code for the described methodology is open-source or provide a link to a repository. It only mentions the JMLR license for the paper itself.
Open Datasets Yes We present results for the Hausdorffsemimetric applied to the Covertype dataset, found in the UCI Machine Learning Repository.2 This dataset contains 7 different label types... 2. http://tinyurl.com/cover-data
Dataset Splits No The paper mentions applying the semimetric to the Covertype dataset and reports compressed sizes for specific binary classification problems (e.g., Covertype 2 vs. 5). However, it does not provide details on training, validation, or test splits for this dataset.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models, memory, or cluster specifications.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup No The paper describes the theoretical framework and algorithmic aspects of classification for semimetrics and presents results on a dataset. However, it does not provide specific experimental setup details such as hyperparameters (e.g., learning rate, batch size, number of epochs) or other training configurations for the algorithms applied.