Data subsampling for Poisson regression with pth-root-link

Authors: Han Cheng Lie, Alexander Munteanu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental G.3 Experimental illustration. All experiments were run on a commodity machine with Intel Core i7-7700K processor (4 cores, 4.2GHz, 32GB RAM) and took overall around 50 minutes to complete. The Python code of [34] was adapted to the Poisson regression setting.4 We applied it with the appropriate p {1, 2} to the datasets with dimensions n = 100 000, d = 7 generated as detailed in the previous section. We compared our method to uniform sampling as a baseline.
Researcher Affiliation Academia Han Cheng Lie Institut für Mathematik Universität Potsdam Germany hanlie@uni-potsdam.de. Alexander Munteanu Department of Statistics TU Dortmund University Germany alexander.munteanu@tu-dortmund.de
Pseudocode Yes G.1 Pseudocode. Here we give pseudocode for our coreset construction Algorithm 1 and for the subsequent optimization procedure Algorithm 2:
Open Source Code Yes Our new code is available at https://github.com/Tim907/poisson-regression/.
Open Datasets No G.2 Synthetic data generation. We generated for each p {1, 2} a dataset with dimensions n = 100 000, d = 7 with n labels corresponding to each point. (The paper describes generating synthetic data but does not state that it is publicly available or provide access information for it.)
Dataset Splits No The paper discusses 'reduced size' for subsampling but does not explicitly provide details about training, validation, or test dataset splits (percentages, counts, or splitting methodology).
Hardware Specification Yes All experiments were run on a commodity machine with Intel Core i7-7700K processor (4 cores, 4.2GHz, 32GB RAM)
Software Dependencies No The Python code of [34] was adapted to the Poisson regression setting. (It mentions 'Python code' but does not specify Python version or any other software dependencies with version numbers.)
Experiment Setup Yes We applied it with the appropriate p {1, 2} to the datasets with dimensions n = 100 000, d = 7 generated as detailed in the previous section. We compared our method to uniform sampling as a baseline. We varied the reduced size between 50 and 600 in equal increments of size 50. For each reduced size and each method, we performed 201 independent repetitions.