Entropy Regularization for Population Estimation
Authors: Ben Chugg, Peter Henderson, Jacob Goldin, Daniel E. Ho
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We run experiments on four publicly available datasets: The Current Population Survey (CPS), the American Community Survey (ACS), a voter turnout dataset, and data on All State severity claims. These four were chosen because they each correspond to a real-world optimize-and-estimate setting. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University 2 Stanford University 3 University of Chicago |
| Pseudocode | Yes | Algorithm 1: Entropy-regularized Pareto Sampling |
| Open Source Code | Yes | Experimental results, datasets, and code can be found at https://github.com/bchugg/ent-reg-pop-est. |
| Open Datasets | Yes | We run experiments on four publicly available datasets: The Current Population Survey (CPS), the American Community Survey (ACS), a voter turnout dataset, and data on All State severity claims. ... More detail on each dataset and further justification for their selection can be found in Appendix B. |
| Dataset Splits | Yes | observations for the first period are selected uniformly at random to provide a initial training set for the model. ... We perform a randomized grid search on a small holdout set to determine a suitable set of hyperparameters for each dataset (see Appendix I for more details). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper mentions using "random forest regressors" but does not specify version numbers for this or any other software components, libraries, or programming languages used. |
| Experiment Setup | Yes | Experimental Protocol: For each dataset and method, observations for the first period are selected uniformly at random to provide a initial training set for the model. ... We perform a randomized grid search on a small holdout set to determine a suitable set of hyperparameters for each dataset (see Appendix I for more details). ... Throughout our experiments, we keep the budget between approximately 5-10% of the dataset size in each period, i.e., Kt [0.05, 0.1]Xt (depending on the dataset). |