Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Nearly optimal classification for semimetrics
Authors: Lee-Ad Gottlieb, Aryeh Kontorovich, Pinhas Nisnevitch
JMLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conclude this section with an illustration of how the theory developed in this paper explains the success of the greedy net-based compression algorithm, even in the case of semimetrics. We present results for the Hausdorffsemimetric applied to the Covertype dataset, found in the UCI Machine Learning Repository.2 This dataset contains 7 different label types, which we treated as 21 separate binary classification problems; we report representative results below. data set original size % compressed down to Covertype 2 vs. 5 2000 97 Covertype 1 vs. 4 2000 25 Covertype 4 vs. 7 2000 2 Figure 2: Summary of the performance of semimetric sample compression algorithm. |
| Researcher Affiliation | Academia | Lee-Ad Gottlieb EMAIL Department of Computer Science Ariel University Ariel, Israel Aryeh Kontorovich EMAIL Department of Computer Science Ben-Gurion University Beer Sheva, Israel Pinhas Nisnevitch EMAIL Department of Computer Science Tel-Aviv University Tel-Aviv, Israel |
| Pseudocode | Yes | Algorithm 1 Brute-force net construction Require: sample S, margin r Ensure: C is an r-net for S if ρ(x, C) r then C = C {x} end if end for |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is open-source or provide a link to a repository. It only mentions the JMLR license for the paper itself. |
| Open Datasets | Yes | We present results for the Hausdorffsemimetric applied to the Covertype dataset, found in the UCI Machine Learning Repository.2 This dataset contains 7 different label types... 2. http://tinyurl.com/cover-data |
| Dataset Splits | No | The paper mentions applying the semimetric to the Covertype dataset and reports compressed sizes for specific binary classification problems (e.g., Covertype 2 vs. 5). However, it does not provide details on training, validation, or test splits for this dataset. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models, memory, or cluster specifications. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | No | The paper describes the theoretical framework and algorithmic aspects of classification for semimetrics and presents results on a dataset. However, it does not provide specific experimental setup details such as hyperparameters (e.g., learning rate, batch size, number of epochs) or other training configurations for the algorithms applied. |