Entropy Regularization for Population Estimation

Authors: Ben Chugg, Peter Henderson, Jacob Goldin, Daniel E. Ho

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run experiments on four publicly available datasets: The Current Population Survey (CPS), the American Community Survey (ACS), a voter turnout dataset, and data on All State severity claims. These four were chosen because they each correspond to a real-world optimize-and-estimate setting.
Researcher Affiliation Academia 1Carnegie Mellon University 2 Stanford University 3 University of Chicago
Pseudocode Yes Algorithm 1: Entropy-regularized Pareto Sampling
Open Source Code Yes Experimental results, datasets, and code can be found at https://github.com/bchugg/ent-reg-pop-est.
Open Datasets Yes We run experiments on four publicly available datasets: The Current Population Survey (CPS), the American Community Survey (ACS), a voter turnout dataset, and data on All State severity claims. ... More detail on each dataset and further justification for their selection can be found in Appendix B.
Dataset Splits Yes observations for the first period are selected uniformly at random to provide a initial training set for the model. ... We perform a randomized grid search on a small holdout set to determine a suitable set of hyperparameters for each dataset (see Appendix I for more details).
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper mentions using "random forest regressors" but does not specify version numbers for this or any other software components, libraries, or programming languages used.
Experiment Setup Yes Experimental Protocol: For each dataset and method, observations for the first period are selected uniformly at random to provide a initial training set for the model. ... We perform a randomized grid search on a small holdout set to determine a suitable set of hyperparameters for each dataset (see Appendix I for more details). ... Throughout our experiments, we keep the budget between approximately 5-10% of the dataset size in each period, i.e., Kt [0.05, 0.1]Xt (depending on the dataset).