Estimating Unknown Population Sizes Using the Hypergeometric Distribution
Authors: Liam Hodgson, Danilo Bzdok
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical data simulation demonstrates that our method outperforms other likelihood functions used to model count data, both in terms of accuracy of population size estimate and learning an informative latent space. |
| Researcher Affiliation | Academia | 1Mc Gill University, Montr eal, Canada 2Mila Qu ebec Artificial Intelligence Institute. |
| Pseudocode | Yes | Algorithm 1 Dataset simulation |
| Open Source Code | No | The paper does not provide explicit statements or links indicating that the source code for the methodology is open-source or publicly available. |
| Open Datasets | Yes | We test this hypothesis using the Common Lit Ease of Readability (CLEAR) Corpus (Crossley et al., 2023), an open-source dataset consisting of almost 5000 text excerpts sourced from Grade 3-12 reading curricula. |
| Dataset Splits | No | The paper describes training models on datasets but does not provide specific details on train/validation/test splits, percentages, or explicit methodologies for splitting data for reproducibility of model evaluation. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers (e.g., specific library versions for Python, PyTorch, or other tools). |
| Experiment Setup | Yes | Model and training hyperparameters are given in Appendix B. Table 2 provides: Encoder layers 128, 128; Decoder layers 128, 128; Latent space dimension 10; Learning rate 0.01; Batch size 100; Violation penalty (min/max) 1 (for Simulated and CLEAR) or 1/100 (for SPIKE). |