Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Distribution Estimation under the Infinity Norm
Authors: Aryeh Kontorovich, Amichai Painsky
JMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate our results in synthetic and real-world experiments. Finally, we apply our proposed framework to a basic selective inference problem, where we estimate the most frequent probabilities in a sample. |
| Researcher Affiliation | Academia | Aryeh Kontorovich EMAIL Department of Computer Science Ben Gurion University of the Negev Beer Sheva, Israel Amichai Painsky EMAIL School of Industrial Engineering Tel Aviv University Tel Aviv, Israel |
| Pseudocode | No | The paper describes algorithms and methods but does not present any structured pseudocode or algorithm blocks with specific steps. |
| Open Source Code | No | The paper does not contain any explicit statement about the release of source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We begin with a census data; we consider the 2000 United States Census (Bureau, 2014), which lists the frequency of the top 1000 most common last names in the United States. |
| Dataset Splits | No | The paper mentions "In each experiment we draw n samples from a distribution over an alphabet size A to evaluate the proposed bounds for a given confidence level 1 δ. We repeat this process 10^4 times and report the average bound and coverage rate...". This describes data sampling and repetition, but not specific training/test/validation splits for machine learning experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not specify any particular software libraries, frameworks, or their version numbers used in the experiments. |
| Experiment Setup | Yes | We set s = 1.1 throughout our experiments. In the first experiment we focus on A = 100 and δ = 0.05. We examine three bounds. First, we consider the bound from Theorem 2 with δ1 = 0.99δ, δ2 = 0.01δ. We set m to minimize (13) over the worst-case distribution (see (10)). This results in m = log(1/δ1) + 2 ≈ 8. Further, we examine Theorem 6 and the benchmark bound (1). ... Next, we examine the performance of our proposed schemes for a decaying confidence level, δ = 1/n^2. As above, we set A = 100 and focus on the two benchmark distributions. ... for a typical experiment of n = 10000 |