reproducibilityindex.ai

Entropy Regularization for Population Estimation

Authors: Ben Chugg, Peter Henderson, Jacob Goldin, Daniel E. Ho

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We run experiments on four publicly available datasets: The Current Population Survey (CPS), the American Community Survey (ACS), a voter turnout dataset, and data on All State severity claims. These four were chosen because they each correspond to a real-world optimize-and-estimate setting.
Researcher Affiliation	Academia	1Carnegie Mellon University 2 Stanford University 3 University of Chicago
Pseudocode	Yes	Algorithm 1: Entropy-regularized Pareto Sampling
Open Source Code	Yes	Experimental results, datasets, and code can be found at https://github.com/bchugg/ent-reg-pop-est.
Open Datasets	Yes	We run experiments on four publicly available datasets: The Current Population Survey (CPS), the American Community Survey (ACS), a voter turnout dataset, and data on All State severity claims. ... More detail on each dataset and further justification for their selection can be found in Appendix B.
Dataset Splits	Yes	observations for the first period are selected uniformly at random to provide a initial training set for the model. ... We perform a randomized grid search on a small holdout set to determine a suitable set of hyperparameters for each dataset (see Appendix I for more details).
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	The paper mentions using "random forest regressors" but does not specify version numbers for this or any other software components, libraries, or programming languages used.
Experiment Setup	Yes	Experimental Protocol: For each dataset and method, observations for the first period are selected uniformly at random to provide a initial training set for the model. ... We perform a randomized grid search on a small holdout set to determine a suitable set of hyperparameters for each dataset (see Appendix I for more details). ... Throughout our experiments, we keep the budget between approximately 5-10% of the dataset size in each period, i.e., Kt [0.05, 0.1]Xt (depending on the dataset).