Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning discrete distributions with infinite support
Authors: Doron Cohen, Aryeh Kontorovich, Geoffrey Wolfer
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We present a novel approach to estimating discrete distributions with (potentially) infinite support in the total variation metric. In a departure from the established paradigm, we make no structural assumptions whatsoever on the sampling distribution. In such a setting, distribution-free risk bounds are impossible, and the best one could hope for is a fully empirical data-dependent bound. We derive precisely such bounds, and demonstrate that these are, in a well-defined sense, the best possible. Our main discovery is that the half-norm of the empirical distribution provides tight upper and lower estimates on the empirical risk. Furthermore, this quantity decays at a nearly optimal rate as a function of the true distribution. The optimality follows from a minimax result, of possible independent interest. Additional structural results are provided, including an exact Rademacher complexity calculation and apparently a first connection between the total variation risk and the missing mass. |
| Researcher Affiliation | Academia | Doron Cohen Department of Computer Science Ben-Gurion University of the Negev Beer-Sheva, Israel EMAIL; Aryeh Kontorovich Department of Computer Science Ben-Gurion University of the Negev Beer-Sheva, Israel EMAIL; Geoffrey Wolfer Department of Computer Science Ben-Gurion University of the Negev Beer-Sheva, Israel EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any statements about open-sourcing code or links to a code repository. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments involving datasets; therefore, no information about public datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments with dataset splits; therefore, no validation split information is provided. |
| Hardware Specification | No | The paper is theoretical and does not report on experiments requiring hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not include details about an experimental setup, hyperparameters, or training configurations. |