Confidence sequences for sampling without replacement
Authors: Ian Waudby-Smith, Aaditya Ramdas
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1: 95% CS for the number of green and red balls in an urn by sampling Wo R2. Notice that the true totals (650 green, 350 red) are captured by the CSs uniformly over time from the initial sample until all 1000 balls are observed. After sampling 61 balls in this example, the CSs cease to overlap, and we can conclude with 95% confidence that there are more green than red balls in the urn. Figure 2: Consider sampling balls from an urn Wo R with three distinct colors (red, green, and purple). In this example, the urn contains 1000 balls with 300 red, 600 green, and 100 purple. We only require a two-dimensional confidence sequence (yellow region) to capture uncertainty about all three totals. After around 300 balls have been sampled, we are quite confident that the urn is made up mostly of green; after 1000 samples, we know the totals for each color with certainty. Figure 4: Left-most plots show the histogram of the underlying set of numbers x N 1 P r0, 1s N, while right-most plots compare empirical Bernsteinand Hoeffding-type CSs for µ. Specifically, the Hoeffding and empirical Bernstein CSs use the λ-sequences in (3.7) and (3.13), respectively. As expected, in low-variance settings (top), CE t is superior, but in a high-variance setting (bottom), CH t has a slight edge. |
| Researcher Affiliation | Academia | Ian Waudby-Smith1 and Aaditya Ramdas12 Departments of Statistics1 and Machine Learning2 Carnegie Mellon University {ianws, aramdas}@cmu.edu |
| Pseudocode | No | The paper contains mathematical derivations and figures, but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 2Code to reproduce plots is available at github.com/wannabesmith/confseq_wor. |
| Open Datasets | No | The paper does not use standard publicly available datasets with concrete access information. It describes theoretical setups like sampling from a finite population or urns with specified (but not publicly shared) compositions for illustrative purposes. |
| Dataset Splits | No | The paper focuses on theoretical constructions and illustrative examples for confidence sequences, not on training machine learning models that typically involve explicit train/validation/test dataset splits. |
| Hardware Specification | No | The paper focuses on theoretical and mathematical contributions with illustrative examples, and thus does not provide any specific details about the hardware used for computations or experiments. |
| Software Dependencies | No | The paper provides a GitHub link for code to reproduce plots but does not mention any specific software dependencies or their version numbers (e.g., Python, PyTorch, specific libraries). |
| Experiment Setup | No | The paper is primarily theoretical, constructing and analyzing confidence sequences. It discusses choices of tuning parameters (like 'a' and 'b' for the beta-binomial prior or lambda sequences) which are part of the mathematical formulation, rather than a description of an experimental setup with hyperparameters for a machine learning model. |