reproducibilityindex.ai

When are Non-Parametric Methods Robust?

Authors: Robi Bhattacharjee, Kamalika Chaudhuri

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical results are, by nature, large sample; we next validate how well they apply to the finite sample case by trying them out on a simple example. In particular, we ask the following question: How does the robustness of non-parametric classiﬁers change with increasing sample size? This question is considered in the context of two simple non-parametric classiﬁers one nearest neighbor (which is guaranteed to be r-consistent) and histograms (which is not). To be able to measure performance with increasing data size, we look at a simple synthetic dataset the Half Moons.
Researcher Affiliation	Academia	Robi Bhattacharjee * 1 Kamalika Chaudhuri * 1 1Department of Computer Science, Uni versity of California, San Diego. Correspondence to: Robi <rcb hatta@eng.ucsd.edu>.
Pseudocode	Yes	Algorithm 1 Robust Non Par Input: S Dn, weight function W , robustness radius r Sr Adv P run(S, r) Output: WSr
Open Source Code	No	The paper does not contain any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper mentions using the "Halfmoon dataset" but does not provide a specific link, DOI, repository name, or a formal citation with author names and year in brackets or parentheses for accessing it.
Dataset Splits	No	The paper discusses training set size and test examples, but does not provide explicit percentages or counts for training, validation, and test splits, nor does it reference predefined splits with formal citations that would enable reproduction.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., specific Python library versions, framework versions, or solver versions) that would be needed for reproducibility.
Experiment Setup	Yes	We use the Halfmoon dataset with two settings of the gaussian noise parameter σ, σ = 0 (Noiseless) and σ = 0.08 (Noisy). For the Noiseless set ting, observe that the data is already 0.1-separated; for the Noisy setting, we use Adversarial Pruning (Algorithm 1) with parameter r = 0.1 for both classiﬁcation methods.