Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Agnostic Learning of Disjunctions on Symmetric Distributions

Authors: Vitaly Feldman, Pravesh Kothari

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We prove that for every symmetric distribution D, there exists a set of n O(log (1/ϵ)) functions S, such that for every disjunction c, there is function p, expressible as a linear combination of functions in S, such that p ϵ-approximates c in ℓ1 distance on D or Ex D[\|c(x) p(x)\|] ϵ. This implies an agnostic learning algorithm for disjunctions on symmetric distributions that runs in time n O(log (1/ϵ)). We also show that there exists a symmetric distribution D, such that the minimum degree of a polynomial that 1/3-approximates the disjunction of all n variables in ℓ1 distance on D is Ω( n).
Researcher Affiliation	Collaboration	Vitaly Feldman EMAIL IBM Research Almaden San Jose, CA Pravesh Kothari EMAIL Department of Computer Science The University of Texas at Austin Austin, TX
Pseudocode	No	The paper describes algorithms conceptually through theorems and proofs (e.g., Theorem 7 outlines an algorithm), but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide links to any code repositories.
Open Datasets	No	The paper primarily focuses on theoretical aspects, such as symmetric distributions over {0, 1}n, and does not describe experiments using real-world or publicly available datasets. Therefore, no concrete access information for open datasets is provided.
Dataset Splits	No	As the paper is theoretical and does not conduct experiments on specific datasets, there is no mention of training/test/validation dataset splits.
Hardware Specification	No	The paper is theoretical, focusing on mathematical proofs and algorithm design, and does not report on empirical experiments that would require specific hardware specifications.
Software Dependencies	No	The paper is theoretical and does not describe an implementation of the algorithms or methods discussed, hence it does not list any specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical, focusing on mathematical proofs and algorithm design rather than empirical evaluation. Consequently, it does not provide specific experimental setup details such as hyperparameter values or training configurations.