Estimating Unknown Population Sizes Using the Hypergeometric Distribution

Authors: Liam Hodgson, Danilo Bzdok

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical data simulation demonstrates that our method outperforms other likelihood functions used to model count data, both in terms of accuracy of population size estimate and learning an informative latent space.
Researcher Affiliation Academia 1Mc Gill University, Montr eal, Canada 2Mila Qu ebec Artificial Intelligence Institute.
Pseudocode Yes Algorithm 1 Dataset simulation
Open Source Code No The paper does not provide explicit statements or links indicating that the source code for the methodology is open-source or publicly available.
Open Datasets Yes We test this hypothesis using the Common Lit Ease of Readability (CLEAR) Corpus (Crossley et al., 2023), an open-source dataset consisting of almost 5000 text excerpts sourced from Grade 3-12 reading curricula.
Dataset Splits No The paper describes training models on datasets but does not provide specific details on train/validation/test splits, percentages, or explicit methodologies for splitting data for reproducibility of model evaluation.
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies No The paper does not specify software dependencies with version numbers (e.g., specific library versions for Python, PyTorch, or other tools).
Experiment Setup Yes Model and training hyperparameters are given in Appendix B. Table 2 provides: Encoder layers 128, 128; Decoder layers 128, 128; Latent space dimension 10; Learning rate 0.01; Batch size 100; Violation penalty (min/max) 1 (for Simulated and CLEAR) or 1/100 (for SPIKE).